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5 NOVEL HCV NON-STRUCTURAL POLYPEPTIDE 

FIELD OF THE INVENTION 

The present invention relates to polypeptides comprising a mutant non- 
1 0 structural Hepatitis C virus ("HCV") polypeptide useful for immunogenic compounds 
for use against HCV, methods of preparing and using the same, and immunogenic 
compositions comprising the same. The present invention also relates to compositions 
comprising (a) a mutant non-structural HCV polypeptide and (b) a viral polypeptide 
that is not a non-structural HCV polypeptide and methods of using these compositions, 

15 

BACKGROUND OF THE INVENTION 

HCV is now recognized as the major agent of chronic hepatitis and liver disease 
worldwide. It is estimated that HCV infects about 400 million people worldwide, 
corresponding to more than 3% of the world population. 
20 Hepatitis C virus ("HCV") is a small enveloped RNA flavivirus, which contains 

a positive-stranded RNA genome of about 10 kilobases. The genome has a single 
uninterrupted ORF that encodes a protem of 3010-301 1 amino acids. The structural 
protems of HCV include a core protein (C), which is highly immunogenic, as well as 
two envelope proteins (El and E2), which likely form a heterodimer in vivo, and non- 
25 structural proteins NS2-NS5. It is known that the NS3 region of the virus is important 
for post-translational processing of the polyprotein into individual proteins, and the 
NS5 region encodes an RNA-dependant RNA polymerase. 

Virus-specific T lymphocytes, along with neutraUzing antibodies, are the 
mainstay of the antiviral immune defense in established viral infections. Whereas 
30 CDS'" cytotoxic T cells eliminate virus-infected-cells, CD4"' T helper cells are essentifif 
for the efficient regulation of the antiviral immune response. CD4'' T helper cells 
recognize specific antigens as peptides bound to autologous HLA class II molecules 
(viral antigens or particles are taken up by professional antigen-presenting cells, 
processed to peptides, bound to HLA class n molecules in the lysosomal compartment, 
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and transported back to the cell surface). Several observations support an important 
role ofCDA"^ T cells in the elimination of HCV infection. Tsai et ai, 1997 Hepatology 
25:449-458; Diepolder et al 1995 Lancet 346: 1—6-1009; Missale et al 1996 JCI 98: 
706-714; Botarelli et al 1993; Gastro 104: 580-587; Diepolder et al 1997 J.Virol 71: 
5 601 1 . Immunogenic peptides usually have a minimal length of 8-1 1 amino acids. 

However, since the peptide binding groove of HLA class II molecules seems to be open 
at both ends, longer peptides are tolerated. Thus peptides eluted from HLA class 11 
molecules are typically in the range of 15-25 amino acids. HLA class II molecules are 
extremely polymorphic and each allele seems to have its individual requirements for 

10 peptide binding. Thus the HLA class II repertoire of a given individual determines 
which viral peptides can be presented to T cells. Recognition of the specific HLA- 
peptide complex by the T cell receptor accompanied by appropriate costimulatory 
signals lead to T cell activation, secretion of cytokines, and T cell proliferation. 

Numerous studies demonstrate that HLA Class 11 restricted CDA" responses are 

1 5 determined by stimulating peripheral blood mononuclear cells with recombinant viral 
antigens or peptides. Botarelli et al, (1993) Gastroenterology 104:580-587; Farrari et 
al, (1994) Hepatology 19:286-295; Minutello et al, (1993) C. J. Exp. Med. 178:17-25; 
Hofi&nann et al, (1995) Hepatology 21:632-638; Iwata et al, (1995) Hepatology 
22:1057-1064; and Tsai.eM/., (1995) Hepatology 21:908-912. 

20 Polyclonal multispecijBc CD8^ T cell responses have been detected in patients 

with chronic hepatitis C. Additionally, CD8^CTL's were shown to be important in 
resolving acute HCV infection in chimpanzees (Cooper et al. Immunity 1999). About 
50% of patients with chronic hepatitis C demonstrate a detectable virus-specific CD4'" 
T cell response, which is most frequently directed against HCV core and/or NS4 and 

25 tends to be more common in patients who achieve sustained viral clearance during 
interferon-a therapy. 

Depending on the pattern of lymphokines, CD4"' T helper cells have been 
classified as THl, THO, or TH2. Cytokines of the THl type are typically EFN-y, 
lymphotoxin, and interleukin-2 (IL-2), which are believed to support activation of 

30 virus-specific CD8'^ T cells and natural killer cells. The TH2 cytokines IL-4, IL-5, IL- 
10, and EL- 13 are important for B cell activation and differentiation, thus inducing a 
humoral immune response. 
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During acute hepatitis C infection a strong and sustained THl/THO response to 
NS3 and possibly to other nonstructural proteins is associated with a self-limited course 
of the disease. Diapolder et al, (1995) Lancet 346:1006-1007, showed all CD4^ T cell 
clones to have a THl or THO cytokine profile, suggesting that the clones support 
5 cytotoxic immune mechanisms in vivo. The majority of 004"^ T cell clones responded 
to a relatively short segment of NS3, namely amino acids 1207-1278, suggesting that 
this region of NS3 is immunodominant for CDA"" T cells. More than 70% of those 
who contract HCV develop chronic infection and hepatitis, and a significant portion of 
them progress to cirrhosis and eventually hepatocellular carcinoma. The only approved 
10 therapy at present is a 6- to 12- month course of interferon a, which leads to sustained 
improvement in only 20% of patients. So far, no commercial vaccine is available. 

Thus, there remains a need for compositions and methods capable of promoting 
anti-HCV responses. 

1 5 SUMMARY OF THE INVENTION 

In one aspect, the present invention relates to isolated polypeptides comprising 
mutant hepatitis C ("HCV") polypeptides comprising at least portions of NS3, NS4, 
and NS5. In a preferred aspect, NS3 is encoded by a nucleic acid sequence having an 
N-terminal deletion to remove the catalytic domain. The NS mutant polypeptides can 

20 include NS3, NS4s, NS4b, NS5a, NS5b or portions thereof. For example, in various 
embodiments, the mutant NS polypeptide comprises NS3, NS4 (NS4a and NS4b) and 
NS5 (NS5a and NS5b). In other embodiments, the NS polypeptide consists of NS3 and 
NS4 (for example, NS4a and/or NS4b) or NS3 and NS5 (for example, NS5a and/or 
NS5b). Other combinations of fiiU-length or firagments of non-structural components 

25 are also contemplated. 

In another preferred aspect, the polypeptides fiirther comprise a viral 
polypeptide that is not a non-structural HCV polypeptide. Such polypeptides are 
preferably C, or antigenic firagments thereof, more preferably, truncated C of HCV. 
Other polypeptides are preferably E, or antigenic firagments thereof, more preferably, 

30 El or E2 of HCV. Such polypeptides need not be encoded by a natural HCV genome, 
and include, for example, truncated or otherwise mutant HCV polypeptides or 
polypeptides derived firom other genomes, such as, for example, polypeptides of HBV. 
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Thus, the invention includes an isolated mutant non-structural ("NS") HCV polypeptide 
comprising a polypeptide having a mutation in the catalytic domain of NS3 that 
functionally disrupts the catalytic domain. The mutation can be, for example, a 
deletion or a substitution mutation. In certain embodiments, the mutant NS polypeptide 
5 comprises NS3, NS4 and NS5. In other embodiments, the mutant NS polypeptides 
described herein further comprise a second viral polypeptide that is not NS3, NS4, or 
NS5 of HCV, for example an HCV Core polypeptide ("C"), or fragment thereof, or an 
HCV envelope protein ("E"), for example El and/or E2. In certain embodiments, C is 
truncated (e.g., at amino acid 121). 

10 In another aspect, the present invention relates to compositions comprising any 

of the mutant hepatitis C C*HCV") polypeptides described herein, for example 
polypeptides comprising at least portions of NS3, NS4, and NS5. In a preferred aspect, 
NS3 is encoded by a nucleic acid sequence having an N-terminal deletion to disrupt the 
function of the catalytic domain, for example by removing this domain. In another 

1 5 preferred aspect, the polypeptides further comprise a viral polypeptide that is not a non- 
structural HCV polypeptide. Such polypeptides are preferably C, or antigenic 
fragments thereof, more preferably, truncated C of HCV. Other polypeptides are 
preferably E, or antigenic fragments thereof, more preferably, El or E2 of HCV Such 
polypeptides need not be encoded by a natural HCV genome, and include, for example, 

20 truncated or otherwise mutant HCV polypeptides or polypeptides derived from other 
genomes, such as, for example, polypeptides of HBV. In another aspect, the invention 
includes a composition comprising (a) any of the polypeptides described herein; and (b) 
a pharmaceutically acceptable excipient (e.g., carrier and/or adjuvant). 

In another aspect, the invention includes an isolated and purified polynucleotide 

25 which encodes any of the mutant HCV polypeptides described herein. In certain 

embodiments, the invention includes a composition comprising (a) the isolated purified 
polynucleotide encoding any of the mutant HCV polypeptides; and (b) a 
pharmaceutically acceptable excipient. The polynucleotide, can be for example, DNA 
in a plasmid, or is in a plasmid. Additionally, the polynucleotides described herein may 

30 be included in an expression vector as shown in the attached Figures and Sequence 
Listings. 
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In another aspect, the present invention relates to host cells transformed with 
expression vectors comprising a nucleic acid sequence encoding a mutant HCV 
polypeptide comprising at least portions of NS3, NS4, and NS5. In a preferred aspect, 
the expression vectors of the host cells further comprises at least one nucleic acid 
sequence encoding a viral polypeptide that is not a non-structural HCV polypeptide. 
Such polypeptides are preferably C, or antigenic fragments thereof, more preferably, 
truncated C of HCV. Other polypeptides are preferably E, or antigenic fragments 
thereof, more preferably, El or E2 of HCV, Such polypeptides need not be encoded 
by a natural HCV genome, and include, for example, truncated or otherwise mutant 
HCV polypeptides or polypeptides derived from other genomes, such as, for example, 
polypeptides of HBV. In another preferred aspect the nucleic acid sequences of the 
expression vectors are coexpressed. In yet another preferred aspect, the host cells are 
yeast cells or mammalian cells. 

In another aspect, the present invention relates to expression vectors comprising 
a nucleic acid sequence encoding a mutant HCV polypeptide comprising NS3, NS4, 
and NS5. In a preferred aspect, the expression vectors of the host cells further 
comprises at least one nucleic acid sequence encoding a viral polypeptide that is not a 
non-structural HCV polypeptide. Such polypeptides are preferably C, or antigenic 
fragments thereof, more preferably, truncated C of HCV, Other polypeptides are 
preferably E, or antigenic fragments thereof, more preferably. El or E2 of HCV. 
Importantly, such polypeptides need not be encoded by a natural HCV genome, such 
as, for example, truncated or otherwise mutant HCV polypeptides or polypeptides 
derived from other genomes, such as, for example, polypeptides: of HBV. In another 
aspect, the present invention relates to methods of preparing a mutant HCV 
polypeptides. In a preferred aspect, the method comprises the steps of transforming a 
host cell with an expression vector, said vector comprising a nucleic acid sequence 
encoding a mutant HCV polypeptide comprising at least portions of NS3, NS4, and 
NS5, and isolating said polypeptide. In another preferred aspect the HCV polypeptide 
further comprises a viral polypeptide that is not a non-structural HCV polypeptide. 
Such polypeptides are preferably C, or antigenic fragments thereof, more preferably, 
truncated C of HCV. Other polypeptides are preferably E, or antigenic fragments 
thereof, more preferably, Elor E2 of HCV. Such polypeptides need not be encoded by 
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a natural HC V genome, and include, for example, truncated or otherwise mutant HC V 
polypeptides or polypeptides derived from other genomes, such as, for example, 
polypeptides of HBV. In another preferred aspect the host cells are yeast cells or 
mammalian cells. 

5 In another aspect, the present invention relates to antibodies which specifically 

bind to mutant HCV polypeptide comprising NS3, NS4, and NS5, and to methods of 
making and using the same. In a preferred aspect, the HCV polypeptide further 
comprises a viral polypeptide that is not a non-structural HCV polypeptide. Such 
polypeptides are preferably C, or antigenic fragments thereof, more preferably, 

10 truncated C of HCV, Other polypeptides are preferably E, or antigenic fragments 

thereof, more preferably, El or E2 of HCV. Such polypeptides need not be encoded 
by a natural HCV genome, such as, for example, truncated or otherwise mutant HCV 
polypeptides or polypeptides derived from other genomes, and include, for example, 
polypeptides of HBV. In another preferred aspect, the antibody is either monoclonal or 

15 polyclonal. 

In yet another aspect, a method of preparing a mutant NS HCV polypeptide, 
wherein the method comprises the steps of (a) transforming a host cell with any of the 
expression vectors described herein, under conditions wherein the polypeptide is 
expressed; and (b) isolating the polypeptide. The host cell can be, for example, a yeast 
20 cell, a mammalian cell a plant cell or an msect cell. The polypeptide can be expressed 
and isolated intracellularly or can be secreted and isolated from the surrounding 
environment. 

In a still further aspect, a method of eliciting an immune response in a subject is 
provided. The immime response can be elicited by administering any of the 
25 polynucleotides and/or polypeptides described herein in one or multiple doses. 

These and other embodiments of the subject invention will readily occur to 
those of skill in the art in light of the disclosure herein. 

BRIEF DESCRIPTION OF THE FIGURES 

30 FIG. 1 shows the cloning scheme for generating pCMV-NS35. 
FIG. 2 shows the 962 Ibp vector pCMV-NS35. 
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FIG. 3 shows the nucleic acid sequence of pCMV-NS35 (SEQ ID N0:1), including the 
nucleic acid sequence of the NS35 ORF, and also the translation of NS35 (SEQ ID 
NO:2). 

FIG. 4 shows the 962 Ibp pCMV-delNS35. 
5 FIG. 5 shows the nucleic acid sequence of pCMV-delNS35 (SEQ ID N0:3), including 
the nucleic acid sequence of the delNS35 ORF, and also the translation of the delNS35 
polypeptide (SEQ ID N0:4). 
FIG. 6 shows the 4276bp pCMV-II. 

FIG. 7 shows the nucleic acid sequence of pCMV-II (SEQ ID N0:5). 
10 FIG. 8 shows the 6300bp pCMV-NS34A. 

FIG. 9 shows the nucleic acid sequence of pCMV-NS34A (SEQ ID NO:6), mcluding 
the nucleic acid sequence of the NS34A ORF, and also the translation of NS34A (SEQ 
ID NO:7). 

FIG. 10 shows the cloning scheme for generating pd.ANS3NS5. 
15 FIG. 1 1 shows the nucleic and amino acid sequences of pd.ANS3NS5 (SEQ ID NO:8 
and 9). 

FIG. 12 shows the Western blot of proteins expressed by S. cerevisiae strain AD3 
transformed with pd.ANS3NS5. 

FIG. 13 shows the cloning scheme for generating pd.ANS3NS5.pj. 
20 FIG. 14 shows the nucleic and amino acid sequences of pd.ANS3NS5.pj (SEQ ED 
NO:10andll). 

FIG. 15 shows the Westem blot of proteins expressed by S. cerevisiae strain AD3 
transformed with pd.ANS3NS5.pj, specifically demonstrating the expression of 
ANS3NS5 polypeptide. 
25 FIG. 16 shows the cloning scheme for generating pdANS3NS5.pjxorel21RT and 
pdANS3NS5.pj.corel73RT. 

FIG. 17 shows the nucleic and amino acid sequences of pd.ANS3NS5.pj.corel21 (SEQ 
IDNO:12andl3). 

FIG. 18 shows the nucleic and amino acid sequences of pd.ANS3NS5.pj.corel73 (SEQ 
30 IDNO:14andl5). 

FIG. 19 shows the Westem blot of proteins expressed by S. cerevisiae stram AD3 
transformed with pd.ANS3NS5.pj, specifically demonstrating the expression of 
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ANS3NS5.corel21 and ANS3NS5.corel73 polypeptides. Lanes 1 and 7 show See 
Blue Standards. Lane 2 shows control yeast plasmid. Lanes 3 and 4 show 
ANS3NS5.corel21RT polypeptide, colonies 1 and 2. Lanes 5 and 6 show 
ANS3NS5.corel73RT polypeptide, colonies 3 and 4. 
5 FIG. 20 shows the cloning scheme for generating pdANS3NS5.pj.corel40RT and 
pdANS3NS5.pj.corel SORT. 

FIG. 21 shows the nucleic and amino acid sequences of pd.ANS3NS5.pj.corel40 (SEQ 
IDNO:16andl7). 

FIG. 22 shows the nucleic and ammo acid sequences of pd.ANS3NS5.pj.corel50 (SEQ 

10 IDNO:18andl9). 

FIG. 23 shows the Western blot of proteins expressed by S. cerevisiae strain AD3 
transformed with pd.ANS3NS5.pj, specifically demonstrating the expression of 
ANS3NS5corel40 and ANS3NS5corel50 polypeptides. Lane 1 shows See Blue 
Standards. Lanes 2 and 3 show ANS3NS55corel40RT polypeptide, colonies 5 and 6. 

15 Lanes 4 and 5 show ANS3NS5corel50RT polypeptide, colonies 7 and 8. Lane 6 shows 
control yeast plasmid. Lane 7 shows ANS3NS5corel21RT polypeptide, colony 1. 
Lane 8 shows ANS3NS5corel73RT polypeptide, colony 5. 

DETAILED DESCRIPTION OF THE INVENTION 

20 The practice of the present invention will employ, unless otherwise indicated, 

conventional techniques of molecular biology, microbiology, recombinant DNA 
techniques, and immunology, which are within the skill of the art. Such techniques are 
explained fiilly in the literature. See e.g., Sambrook, et al, MOLECULAR CLONING; 
A LABORATORY MANUAL (1989); DNA CLONING, VOLUMES I AND II (D. N. 

25 Glover ed. 1985); OLIGONUCLEOTIDE SYNTHESIS (M. J. Gait ed., 1984); 
NUCLEIC ACID HYBRIDIZATION (B. D. Hames & S. J. Higgins eds. 1984); 
TRANSCRIPTION AND TRANSLATION (B. D. Hames & S, J. Higgins eds. 1984); 
ANIMAL CELL CULTURE (R. I. Freshney ed. 1986); IMMOBILIZED CELLS AND 
ENZYMES (IRL Press, 1986); B. Perbal, A PRACTICAL GUIDE TO MOLECULAR 

30 CLONING (1984); the series, METHODS OF ENZYMOLOGY (Academic Press, 

Inc.); GENE TRANSFER VECTORS FOR MAMMALIAN CELLS (J. H. Miller and 
M. P. Calos eds. 1987, Cold Springs Harbor Laboratory), Methods in Enzymology Vol. 



-8- 



wo 01/38360 



PCT/USOO/32326 



154 and Vol. 155 (Wu and Grossman, and Wu, eds., respectively); Mayer and Walker 
eds. (1987), IMMUNOfflSTOCHEMICAL METHODS IN CELL AND 
MOLECULAR BIOLOGY (Academic Press, London); Scopes, (1987), PROTEIN 
PURIFICATION: PRINCIPALS AND PRACTICE, Second Edition (Springer- Verlag, 
5 New York); and HANDBOOK OF EXPERIMENTAL IMMUNOLOGY, VOLUMES 
I-rV (D. M. Weir and C. C. Blackwell eds. 1986). 

It must be noted that, as used in this specification and the upended claims, the 
singular forms "a", "an" and "the" include plural referents unless the content clearly 
dictates otherwise. Thus, for example, reference to "an antigen" includes a mixture of 
1 0 two or more antigens, and the like. 

L Definitions 

In describing the present invention, the following terms will be employed, and 
are intended to be defined as indicated below. 

1 5 The term "hepatitis C virus" (HCV) refers to an agent causative of Non-A, Non- 

B Hepatitis (NANBH). The nucleic acid sequence and putative amino acid sequence of 
HCV is described in U.S. Patent Nos. 5,856,437 and 5,350,671. The disease caused by 
HCV is called hepatitis C, formerly called NANBH. The term HCV, as used herein, 
denotes a viral species of which pathenogenic strains cause NANBH, as well as 

20 attenuated strains or defective interfering particles derived therefi-om. 

HCV is a member of the viral family flaviviridae. The morphology and 
composition of Flavivirus particles are known, and are discussed in Reed et al, Curr, 
Stud, Hematol Blood Transfus. (1998), 62:1-37; HEPATITIS C VIRUSES IN FIELDS 
VIROLOGY (B.N. Fields, D.M. Knipe, P.M. Howley, eds.) (3d ed. 1996). It has 

25 recently been found that portions of the HCV genome are also homologous to 

pestiviruses. Generally, with respect to morphology, Flavivimses contain a central 
nucleocapsid surroimded by a lipid bilayer. Virions are spherical and have a diameter 
of about 40-50 nm. Their cores are about 25-30 ran in diameter. Along the outer 
surface of the virion envelope are projections that are about 5-10 nm long with tenninal 

30 knobs about 2 nm in diameter. 

The HCV genome is comprised of RNA. It is known that RNA containing 
viruses have relatively high rates of spontaneous mutation. Therefore, there can be 
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multiple strains, which can be virulent or avirulent, within the HCV class or species. 
The ORF of HCV, including the translation spans of the core, non-structural, and 
envelope proteins, is shovm in U.S. Patent Nos. 5,856,437 and 5,350,671. 

The terms "polypeptide" and "protein" refer to a polymer of amino acid 
residues and are not limited to a minimum length of the product. Thus, peptides, 
oligopeptides, dimers, multimers, and the like, are included within the definition. Both 
full-length proteins and firagments thereof are encompassed by the definition. The 
terms also include postexpression modifications of the polypeptide, for example, 
glycosylation, acetylation, phosphorylation and the like. Furthermore, for purposes of 
the present invention, a "polypeptide" refers to a protein which includes modifications, 
such as deletions, additions and substitutions (generally conservative in nature), to the 
native sequence, so long as the protein maintains the desired activity. These 
modifications may be deliberate, as through site-directed mutagenesis, or may be 
accidental, such as through mutations of hosts which produce the proteins or errors due 
to PCR amplification. 

An HCV polypeptide is a polypeptide, as defined above, derived fi-om the HCV 
polyprotein. The polypeptide need not be physically derived fi-om HCV, but may be 
synthetically or recombinantly produced. Moreover, the polypeptide may be derived 
fi-om any of the various HCV strains, such as fi-om strains 1, 2, 3 or 4 of HCV. A 
number of conserved and variable regions are known between these strains and, in 
general, the amino acid sequences of epitopes derived fi-om these regions will have a 
high degree of sequence homology, e.g., amino acid sequence homology of more than 
30%, preferably more than 40%, when the two sequences are aligned and homology 
determined by any of the programs or algorithms described herein. Thus, for example, 
the term '1^184" polypeptide refers to native NS4 fi-om any of the various HCV strains, 
as well as NS4 analogs, muteins and immunogenic fi-agments, as defined fiirther below. 

Further, the terms "ANS35," "delNS35," "ANS3NS5," and "ANS3-5" as used 
herein refer to a mutant polypeptide, comprising at least portions of NS3, NS4, or NS5, 
comprising a deletion in, or mutation of, the NS3 protease active site region to render 
the protease non-fimctional. In one embodiment, ANS3-5 comprises amino acids 1242- 
301 1 , as shovm in FIG. 5, or polypeptides substantially homologous thereto. It will be 
readily apparent to one of ordinary skill in the art how to determine that NS3 protease 
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has been rendered non-functional. If the protease is functional, one will obtain protein 
of the expected molecular weight upon expression. As set forth in Example 2 and 
Figure 15, using SDS-page, 4-20%, a protein having a molecular weight of 
approximately 194kD was obtained when strain AD3 was transformed with 
5 pd.ANS3NS5.PJ clone #5. One skilled in the art could readily determine whether a 
protein of the desired molecular weight was expressed for any given deletion or 
mutation. 

The terms "analog" and "mutein" refer to biologically active derivatives of the 
reference molecule, or fragments of such derivatives, that retain desired activity, such 

10 as the abiUty to stimulate a cell-mediated immune response, as defined below. In 

general, the term "analog" refers to compounds having a native polypeptide sequence 
and structure with one or more amino acid additions, substitutions (generally 
conservative in nature) and/or deletions, relative to the native molecule, so long as the 
modifications do not destroy immunogenic activity. The term "mutein" refers to 

1 5 peptides having one or more peptide mimics ("peptoids"), such as those described in 
International Publication No. WO 91/04282. Preferably, the analog or mutein has at 
least the same immunoactivity as the native molecule. Methods for making 
polypeptide analogs and muteins are known in the art and are described fiirther below. 
Particularly preferred analogs include substitutions that are conservative in 

20 nature, i.e., those substitutions that take place within a family of amino acids that are 
related in their side chains. Specifically, amino acids are generally divided into foxu* 
famiUes: (1) acidic - aspartate and glutamate; (2) basic - lysine, arginine, histidine; 
(3) non-polar ~ alanine, valine, leucine, isoleucine, prohne, phenylalanine, methionine, 
tryptophan; and (4) uncharged polar - glycine, asparagine, glutamine, cysteine, serine 

25 threonine, tyrosine. Phenylalanine, tryptophan, and tyrosine are sometimes classified 
as aromatic ammo acids. For example, it is reasonably predictable that an isolated 
replacement of leucine with isoleucine or valine, an aspartate with a glutamate, a 
threonine with a serine, or a similar conservative replacement of an amino acid with a 
structurally related amino acid, will not have a major effect on the biological activity. 

30 For example, the polypeptide of interest may include up to about 5-10 conservative or 
non-conservative amino acid substitutions, or even up to about 15-25 conservative or 
non-conservative amino acid substitutions, or any integer between 5-25, so long as the 
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desired function of the molecule remains intact. One of skill in the art may readily 
determine regions of the molecule of interest that can tolerate change by reference to 
HoppAVoods and Kyte-Doolittle plots, well known in the art. 

By "fragment" is intended a polypeptide consisting of only a part of the intact 
5 full-length polypeptide sequence and structure. The fragment can include a C-terminal 
deletion and/or an N-terminal deletion of the native polypeptide. An "immunogenic 
fragment" of a particular HCV protein will generally include at least about 5-10 
contiguous amino acid residues of the full-length molecule, preferably at least about 
15-25 contiguous amino acid residues of the full-length molecule, and most preferably 

10 at least about 20-50 or more contiguous amino acid residues of the full-length 

molecule, that define an epitope, or any integer between 5 amino acids and the full- 
length sequence, provided that the fragment in question retains immunogenic activity, 
as measured by the assays described herein. For a description of various HCV 
epitopes, see, e.g., Chien et ai, Proc, Natl. Acad. ScL USA (1992) 89:1001 1-10015; 

15 Chien et al,, J, Gastroent. Hepatol (1993) 8:S33-39; Chien et al.. International 
Publication No. WO 93/00365; Chien, D.Y., International PubUcation No. WO 
94/01778; commonly ovmed, allowed U.S. Patent Application Serial Nos. 08/403,590 
and 08/444,818. 

The term "epitope" as used herein refers to a sequence of at least about 3 to 5, 
20 preferably about 5 to 10 or 1 5, and not more than about 1 ,000 amino acids (or any 
integer therebetween)^ which define a sequence that by itself or as part of a larger 
sequence, binds to an antibody generated in response to such sequence. There is no 
critical upper limit to the length of the fragment, which may comprise nearly the full- 
length of the protein sequence, or even a fusion protein comprising two or more 
25 epitopes from the HCV polyprotein. An epitope for use m the subject invention is not 
limited to a polypeptide having the exact sequence of the portion of the parent protein 
from which it is derived. Indeed, viral genomes are in a state of constant flux and 
contain several variable domains which exhibit relatively high degrees of variability 
between isolates. Thus the term "epitope" encompasses sequences identical to the 
30 native sequence, as well as modifications to the native sequence, such as deletions, 
additions and substitutions (generally conservative in nature). 
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Regions of a given polypeptide that include an epitope can be identified using 
any number of epitope mapping techniques, well known in the art. See, e.g.. Epitope 
Mapping Protocols in Methods in Molecular Biology, Vol. 66 (Glenn E. Morris, Ed., 
1996) Humana Press, Totowa, New Jersey. For example, linear epitopes may be 
determined by e.g., concurrently synthesizing large numbers of peptides on solid 
supports, the peptides corresponding to portions of the protein molecule, and reacting 
the peptides with antibodies while the peptides are still attached to the supports. Such 
techniques are known in the art and described in, e.g., U.S. Patent No. 4,708,871; 
Geysenetal. (1984) Proc. Natl Acad, Set 81:3998-4002; Geysen et al. (1986) 
Molec. Immunol 23:709-715. Similarly, conformational epitopes are readily 
identified by determining spatial conformation of amino acids such as by, e.g., x-ray 
crystallography and 2-dimensional nuclear magnetic resonance. See, e.g.. Epitope 
Mapping Protocols, supra. Antigenic regions of proteins can also be identified using 
standard antigenicity and hydropathy plots, such as those calculated using, e.g., the 
Omiga version 1.0 software program available from the Oxford Molecular Group. This 
computer program employs the Hopp/Woods method, Hopp et al., Proc. Natl Acad, 
Sci USA (1981) 78:3824-3828 for determining antigenicity profiles, and the Kyte- 
Doolittle technique, Kyte et al., J, Mol Biol (1982) 157:105-132 for hydropathy plots. 

As used herein, the term "conformational epitope" refers to a portion of a full- 
length protein, or an analog or mutein thereof, having structural features native to the 
amino acid sequence encoding the epitope within the full-length natural protein. Native 
structural features include, but are not limited to, glycosylation and three dimensional 
f structure. Preferably, a conformational epitope is produced recombinantly and is 
expressed in a cell from which it is extractable under conditions which preserve its 
desired structural features, e.g. without denaturation of the epitope. Such cells include 
bacteria, yeast, insect, and mammalian cells. Expression and isolation of recombinant 
conformational epitopes from the HCV polyprotein are described in e.g.. International 
Publication Nos. WO 96/04301, WO 94/01778, WO 95/33053, WO 92/08734. 

An "immunological response" to an HCV antigen (including both polypeptide 
and polynucleotides encoding polypeptides that are expressed in vivo) or composition is 
the development in a subject of a humoral and/or a cellular immune response to 
molecules present in the composition of interest. For purposes of the present invention. 
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a "humoral immune response" refers to an immune response mediated by antibody 
molecules, while a "cellular immune response" is one mediated by T-lymphocytes 
and/or other white blood cells. One important aspect of cellular immunity involves an 
antigen-specific response by cytolytic T-cells ("CTLs"). CTLs have specificity for 
5 peptide antigens that are presented in association with proteins encoded by the major 
histocompatibility complex (MHC) and expressed on the surfaces of cells. CTLs help 
induce and promote the intracellular destruction of intracellular microbes, or the lysis 
of cells infected with such microbes. Another aspect of cellular immunity involves an 
antigen-specific response by helper T-cells. Helper T-cells act to help stimulate the 

10 function, and focus the activity of, nonspecific effector cells against cells displaying 
peptide antigens in association with MHC molecules on their surface. A "cellular 
immune response" also refers to the production of cytokines, chemokines and other 
such molecules produced by activated T-cells and/or other white blood cells, including 
those derived fi-om CD4-I- and CD8+ T-cells. 

15 A composition or vaccine that elicits a cellular immune response may serve to 

sensitize a vertebrate subject by the presentation of antigen in association with MHC 
molecules at the cell sxuface. The cell-mediated immune response is directed at, or 
near, cells presenting antigen at their surface. In addition, antigen-specific T- 
lymphocytes can be generated to allow for the future protection of an immunized host. 

20 The ability of a particular antigen to stimulate a cell-mediated immunological 

response may be determined by a number of assays, such as by lymphoproliferation 
(lymphocyte activation) assays, CTL cytotoxic cell assays, or by assaying for T- 
lymphocytes specific for the antigen in a sensitized subject. Such assays are well 
known in the art. See, e.g., Erickson et al, 7. Immunol (1993) 151:4189-4199; Doe et 

25 al., Eur. X Immunol (1994) 24:2369-2376; and the examples below. 

Thus, an immunological response as used herein may be one which stimulates 
the production of CTLs, and/or the production or activation of helper T- cells. The 
antigen of interest may also elicit an antibody-mediated immune response. Hence, an 
immunological response may include one or more of the following effects: the 

30 production of antibodies by B-cells; and/or the activation of suppressor T-cells and/or 
y6 T-cells directed specifically to an antigen or antigens present in the composition or 
vaccine of interest. These responses may serve to neutralize infectivity, and/or mediate 
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antibody-complement, or antibody dependent cell cytotoxicity (ADCC) to provide 
protection or alleviation of symptoms to an immunized host. Such responses can be 
determined using standard immunoassays and neutralization assays, well known in the 
art. 

5 A "coding sequence" or a sequence which "encodes" a selected polypeptide, is a 

nucleic acid molecule which is transcribed (in the case of DNA) and translated (in the 
case of mRNA) into a polypeptide in vitro or in vivo when placed under the control of 
appropriate regulatory sequences. The boundaries of the coding sequence are 
determined by a start codon at the 5' (amino) terminus and a translation stop codon at 

10 the 3* (carboxy) terminus. A transcription termination sequence may be located 3' to 
the coding sequence. 

A "nucleic acid" molecule or '"polynucleotide" can include both double- and 
single-stranded sequences and refers to, but is not limited to, cDNA from viral, 
procaryotic or eucaryotic mRNA, genomic DNA sequences from viral (e.g. DNA 

15 viruses and retroviruses) or procaryotic DNA, and especially synthetic DNA sequences. 
The term also c^tures sequences that include any of the known base analogs of DNA 
and RNA. 

"Operably linked" refers to an arrangement of elements wherein the 
components so described are configured so as to perform their desired function. Thus, 

20 a given promoter operably linked to a coding sequence is capable of effecting the 
expression of the coding sequence when the proper transcription factors, etc., are 
present. The promoter need not be contiguous with the coding sequence, so long as it 
functions to direct the expression thereof Thus, for example, intervening untranslated 
yet transcribed sequences can be present between the promoter sequence and the coding 

25 sequence, as can transcribed introns, and the promoter sequence can still be considered 
"operably linked" to the coding sequence. 

"Recombinant" as used herein to describe a nucleic acid molecule means a 
polynucleotide of genomic, cDNA, viral, semisynthetic, or synthetic origin which, by 
virtue of its origin or manipulation is not associated with all or a portion of the 

30 polynucleotide with which it is associated in nature. The term "recombinant" as used 
with respect to a protein or polypeptide means a polypeptide produced by expression of 
a recombinant polynucleotide. In general, the gene of interest is cloned and then 
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expressed in transformed organisms, as described further below. The host organism 
expresses the foreign gene to produce the protein under expression conditions. 

A ^'control element" refers to a polynucleotide sequence which aids in the 
expression of a coding sequence to which it is linked. The tenn includes promoters, 
5 transcription termination sequences, upstream regulatory domains, polyadenylation 
signals, untranslated regions, including 5'-UTRs and 3'-UTRs and when appropriate, 
leader sequences and enhancers, which collectively provide for the transcription and 
translation of a coding sequence in a host cell. 

A "promoter" as used herein is a DNA regulatory region capable of binding 

10 RNA polymerase in a host cell and initiating transcription of a downstream (3' 
direction) coding sequence operably linked thereto. For purposes of the present 
invention, a promoter sequence includes the minimum number of bases or elements 
necessary to initiate transcription of a gene of interest at levels detectable above 
background. Within the promoter sequence is a transcription initiation site, as well as 

15 protein binding domains (consensus sequences) responsible for the binding of RNA 
polymerase. Eucaryotic promoters will often, but not always, contain 'TATA" boxes 
and "CAT" boxes. 

A control sequence "directs the transcription" of a coding sequence in a cell 
when RNA polymerase will bind the promoter sequence and transcribe the coding 
20 sequence into mRNA, which is then translated into the polypeptide encoded by the 
coding sequence. 

"Expression cassette" or "expression construct" refers to an assembly which is 
capable of directing the expression of the sequence(s) or gene(s) of interest. The 
expression cassette includes control elements, as described above, such as a promoter 

25 which is operably linked to (so as to direct transcription of) the sequence(s) or gene(s) 
of interest, and often includes a polyadenylation sequence as well. Within certain 
embodiments of the invention, the expression cassette described herein may be 
contained within a plasmid construct. In addition to the components of the expression 
cassette, the plasmid construct may also include, one or more selectable markers, a 

30 signal which allows the plasmid construct to exist as single-stranded DNA (e.g., a Ml 3 
origin of replication), at least one multiple cloning site, and a "mammalian" origin of 
replication (e.g., a SV40 or adenovirus origin of replication). 
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"Transformation," as used herein, refers to the insertion of an exogenous 
polynucleotide into a host cell, irrespective of the method used for insertion: for 
example, transformation by direct uptake, transfection, infection, and the like. For 
particular methods of transfection, see further below. The exogenous polynucleotide 
5 may be maintained as a nonintegrated vector, for example, an episome, or alternatively, 
may be integrated into the host genome. 

A "host cell" is a cell which has been transformed, or is capable of 
transformation, by an exogenous DNA sequence. 

By "isolated" is meant, when referring to a polypeptide, that the indicated 
1 0 molecule is separate and discrete from the whole organism with which the molecule is 
found in nature or is present in the substantial absence of other biological macro- 
molecules of the same type. The term "isolated" with respect to a polynucleotide is a 
nucleic acid molecule devoid, in whole or part, of sequences normally associated with 
it in nature; or a sequence, as it exists in nature, but having heterologous sequences in 
1 5 association therewith; or a molecule disassociated from the chromosome. 

The term "purified" as used herein preferably means at least 75% by weight, 
more preferably at least 85% by weight, more preferably still at least 95% by weight, 
and most preferably at least 98% by weight, of biological macromolecules of the same 
type are present. 

20 "Homology" refers to the percent identity between two polynucleotide or two 

polypeptide moieties. Two DNA, or two polypeptide sequences are "substantially 
homologous" to each other when the sequences exhibit at least about 50% , preferably 
at least about 75%, more preferably at least about 80%-85%, preferably at least about 
90%, and most preferably at least about 95%-98%, or more, sequence identity over a 

25 defined length of the molecules. As used herein, substantially homologous also refers 
to sequences showing complete identity to the specified DNA or polypeptide sequence. 
The term "substantially homologous" as used herein in reference to ANS35 generally 
refers to an HCV nucleic or amino acid sequence that is at least 60% identical to the 
entire sequence of the polypeptide encoded by ANS35 (see FIG. 5), where the sequence 

30 identity is preferably at least 75%, more preferably at least 80%, still more preferably at 
least about 85%, especially more than about 90%, most preferably 95% or greater, 
particularly 98% or greater. These homologous polypeptides include fragments, 
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including mutants and allelic variants of the fragments. Identity between the two 
sequences is preferably determined by the Smith-Waterman homology search algorithm 
as implemented in the MPSRCH program (Oxford Molecular), using an affme gap 
search with parameters gap open penalty^ll and gap extension penalty=\ . Thus, for 
5 example, the present invention includes an isolate which is 80% identical to a 

polypeptide encoded by ANS35. In some aspects of the invention, the polypeptide of 
the present invention is substantially homologous to the ANS35, 

In general, "identity" refers to an exact nucleotide-to-nucleotide or amino acid- 
to-amino acid correspondence of two polynucleotides or polypeptide sequences, 

1 0 respectively. Percent identity can be detemiined by a direct comparison of the 

sequence information between two molecules by aligning the sequences, counting the 
exact number of matches between the two aligned sequences, dividing by the length of 
the shorter sequence, and multiplying the result by 100. Readily available computer 
programs can be used to aid in the analysis, such as ALIGN, Dayhoff, M.O. in Atlas of 

15 Protein Sequence and Structure M.O. Dayhoff ed., 5 Suppl. 3:353-358, National 

biomedical Research Foundation, Washmgton, DC, which adapts the local homology 
algorithm of Smith and ^dX^rmm Advances in Appl. Math, 2:482-489, 1981 for 
peptide analysis. Programs for determining nucleotide sequence identity are available 
in the Wisconsin Sequence Analysis Package, Version 8 (available from Genetics 

20 Computer Group, Madison, WI) for example, the BESTFIT, FASTA and GAP 

programs, which also rely on the Smith and Waterman algorithm. These programs are 
readily utilized with the default parameters recommended by the manufacturer and 
described in the Wisconsin Sequence Analysis Package referred to above. For 
example, percent identity of a particular nucleotide sequence to a reference sequence 

25 can be determined using the homology algorithm of Smith and Waterman with a 
default scoring table and a gap penalty of six nucleotide positions. 

Another method of establishing percent identity in the context of the present 
invention is to use the MPSRCH package of programs copyrighted by the University of 
Edinburgh, developed by John F. Collins and Shane S. Sturrok, and distributed by 

30 IntelliGenetics, Inc. (Mountain View, CA). From this suite of packages the Smith- 
Waterman algorithm can be employed where defauh parameters are used for the 
scoring table (for example, gap open penalty of 12, gap extension penalty of one, and a 
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gap of six). From the data generated the "Match" value reflects "sequence identity." 
Other suitable programs for calculating the percent identity or similarity between 
sequences are generally known in the art, for example, another alignment program is 
BLAST, used with default parameters. For example, BLASTN and BLASTP can be 
5 used using the following default parameters: genetic code = standard; filter = none; 
strand = both; cutoff = 60; expect = 10; Matrix = BLOSUM62; Descriptions = 50 
sequences; sort by = HIGH SCORE; Databases == non-redundant, GenBank -f EMBL + 
DDB J + PDB + GenBank CDS translations + Swiss protein + Spupdate + PIR. Details 
of these programs can be found at the following intemet address: 

10 http://www.ncbi.nlm.gov/cgi-bin/BLAST. 

Alternatively, homology can be determined by hybridization of polynucleotides 
under conditions which form stable duplexes between homologous regions, followed 
by digestion with single-stranded-specific nuclease(s), and size determination of the 
digested fragments. DNA sequences that are substantially homologous can be 

15 identified in a Southem hybridization experiment under, for example, stringent 

conditions, as defined for that particular system. Defining appropriate hybridization 
conditions is within the skill of the art. See, e.g., Sambrook et aL, supra; DNA Cloning, 
supra; Nucleic Acid Hybridization, supra, 

"Stringency" refers to conditions in a hybridization reaction that favor 

20 aissociation of very similar sequences over sequences that differ. For example, the 
combination of temperature and salt concentration should be chosen that is 
approximately 120 to 200°C below the calculated Tm of the hybrid under study. The 
temperature and salt conditions can often be determined empirically in preliminary 
experiments in which samples of genomic DNA immobilized on filters are hybridized 

25 to the sequence of interest and then washed under conditions of different stringencies. 
See Sambrook a/, at page 9,50. 

Variables to consider when performing, for example, a Southem blot are (1) the 
complexity of the DNA being blotted and (2) the homology between the probe and the 
sequences being detected. The total amount of the fi-agmentCs) to be studied can vary a 

30 magnitude of 10, from 0.1 to l\xg for a plasmid or phage digest to 10'' to 10'* g for a 
single copy gene in a highly complex eukaryotic genome. For lower complexity 
polynucleotides, substantially shorter blotting, hybridization, and exposure times, a 
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smaller amount of starting polynucleotides, and lower specific activity of probes can be 
used. For example, a single-copy yeast gene can be detected with an exposure time of 
only 1 hour starting with 1 ng of yeast DNA, blotting for two hours, and hybridizing 
for 4-8 hours with a probe of 10® cpm/ng. For a single-copy mammalian gene a 
conservative approach would start with 10 \ig of DNA, blot overnight, and hybridize 
overnight in the presence of 10% dextran sulfate using a probe of greater than 10^ 
cpm/|ig, resulting in an exposiwe time of -24 hoxu^. 

Several factors can affect the melting temperature (Tm) of a DNA-DNA hybrid 
between the probe and the fragment of interest, and consequently, the appropriate 
conditions for hybridization and washing. In many cases the probe is not 100% 
homologous to the fragment. Other commonly encountered variables include the 
length and total G+C content of the hybridizing sequences and the ionic strength and 
formamide content of the hybridization buffer. The effects of all of these factors can be 
q)proximated by a single equation: 

Tm= 81 + 16,6(log,oCi) + 0.4[%(G + C)]-0.6(%formamide) - 600/w-1.5(%mismatch). 
where Ci is the salt concentration (monovalent ions) and n is the length of the hybrid in 
base pairs (slightly modified from Meinkoth&Wahl (1984) >4«a/. Biochem. 138:267- 
284). In general, convenient hybridization temperatures in the presence of 50% 
formamide are 42°C for a probe with is 95% to 100% homologous to the target 
fragment, 3TC for 90% to 95% homology, and 32**C for 85% to 90% homology. For 
lower homologies, formamide content should be lowered and temperature adjusted 
accordingly, using the equation above. If the homology between the probe and the 
target fragment are not known, the simplest approach is to start with both hybridization"' 
and wash conditions which are nonstringent. If non-specific bands or high background 
are observed after autoradiography, the filter can be washed at high stringency and 
reexposed. If the time required for exposure makes this approach impractical, several 
hybridization and/or washing stringencies should be tested in parallel. 

By "nucleic acid immunization" is meant the introduction of a nucleic acid 
molecule encoding one or more selected antigens into a host cell, for the in vivo 
expression of the antigen or antigens. The nucleic acid molecule can be introduced 
directly into the recipient subject, such as by injection, inhalation, oral, intranasal and 
mucosal administration, or the like, or can be introduced ex vivo, into cells which have 



-20- 



wo 01/38360 PCT/USOO/32326 
been removed from the host. In the latter case, the transformed cells are remtroduced 
into the subject where an immune response can be mounted against the antigen encoded 
by the nucleic acid molecule. 

An "open reading frame" or ORF is a region of a polynucleotide sequence 
5 which encodes a polypeptide; this region can represent a portion of a coding sequence 
or a total coding sequence. 

As used herein, the term "antibody" refers to a polypeptide or group of 
polypeptides which comprise at least one antigen binding site. An "antigen binding 
site" is formed from the folding of the variable domains of an antibody molecule(s) to 

10 form three-dimensional binding sites with an internal surface shape and charge 

distribution complementary to the features of an epitope of an antigen, which allows 
specific binding to form an antibody-antigen complex. An antigen binding site may be 
formed from a heavy- and/or light-chain domain (VH and VL, respectively), which 
form hypervariable loops which contribute to antigen binding. The term "antibody" 

15 includes, without limitation, polyclonal antibodies, monoclonal antibodies, chimeric 
antibodies, altered antibodies, univalent antibodies, Fab proteins, and single-domain 
antibodies. In many cases, the binding phenomena of antibodies to antigens is 
equivalent to other ligand/anti-ligand binding. 

If polyclonal antibodies are desired, a selected mammal (e.g., mouse, rabbit, 

20 goat, horse, etc.) is immunized with an immunogenic polypeptide bearing an HCV 
epitope(s). Serum from the immunized animal is collected and treated according to 
known procedures. If serum containing polyclonal antibodies to an HCV epitope 
contains antibodies to other antigens, the polyclonal antibodies can be purified by 
immunoaffinity chromatography. Techniques for producing and processing polyclonal 

25 antisera are known in the art, see for example, Mayer and Walker, eds. (1987) 

IMMUNOCHEMICAL METHODS IN CELL AND MOLECULAR BIOLOGY 
(Academic Press, London). 

Monoclonal antibodies directed against HCV epitopes can also be readily 
produced by one skilled in the art. The general methodology for making monoclonal 

30 antibodies by hybridomas is well known. Immortal antibody-producing cell lines can 
be created by cell fusion, and also by other techniques such as direct transformation of 
B lymphocytes with oncogenic DNA, or transfection with Epstein-Barr virus. See, e.g.. 
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M. Schreier et al. (1980) HYBRIDOMA TECHNIQUES; Hammerling et al. (1981), 
MONOCLONAL ANTBODIES AND T-CELL HYBRIDOMAS; Kennett et al. 
(1980) MONOCLONAL ANimODIES; see also, U.S. Pat. Nos. 4,341,761; 4,399,121; 
4,427,783; 4,444,887; 4,466,917; 4,472,500; 4,491,632; and 4,493,890. Panels of 
5 monoclonal antibodies produced against HCV epitopes can be screened for various 
properties; i.e., for isotype, epitope affinity, etc. As used herein, a "single domain 
antibody" (dAb) is an antibody which is comprised of an HL domain, which binds 
specifically with a designated antigen. A dAb does not contain a VL domain, but may 
contain other antigen binding domains known to exist to antibodies, for example, the 

10 kappa and lambda domains. Methods for preparing dabs are known in the art. See, for 
example. Ward et al, Nature 341: 544 (1989). 

Antibodies can also be comprised of VH and VL domains, as well as other 
known antigen binding domains. Examples of these types of antibodies and methods 
for their preparation and known in the art (see, e.g., U.S. Pat. No. 4,816,467), and 

15 include the following. For example, "vertebrate antibodies" refers to antibodies which 
are tetramers or aggregates thereof, comprising light and heavy chains which are 
usually aggregated in a "Y" configuration and which may or may not have covalent 
linkages between the chains. In vertebrate antibodies, the amino acid sequences of the 
chains are homologous with those sequences foimd in antibodies produced in 

20 vertebrates, whether in situ or in vitro (for example, in hybridomas). Vertebrate 
antibodies include, for example, purified polyclonal antibodies and monoclonal 
antibodies, methods for the preparation of which are described infira. 

"Hybrid antibodies" are antibodies where chains are separately homologous 
with reference to mammalian antibody chains and represent novel assemblies of them, 

25 so that two different antigens are precipitable by the tetramer or aggregate. In hybrid 
antibodies, one pair of heavy and light chains are homologous to those found in an 
antibody raised against a first antigen, while a second pair of chains are homologous to 
those found in an antibody raised against a second antibody. This results in the property 
of "divalence", i.e., the ability to bind two antigens simultaneously. Such hybrids can 

30 also be formed using chimeric chains, as set forth below. 

"Chimeric antibodies" refers to antibodies in which the heavy and/or light 
chains are fiision proteins. Typically, one portion of the amino acid sequences of the 
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chain is homologous to corresponding sequences in an antibody derived from a 
particular species or a particular class, while the remaining segment of the chain is 
homologous to the sequences derived from another species and/or class. Usually, the 
variable region of both Ught and heavy chains mimics the variable regions or antibodies 
5 derived from one species of vertebrates, while the constant portions are homologous to 
the sequences in the antibodies derived from another species of vertebrates. However, 
the definition is not limited to this particular example. Also included is any antibody in 
which either or both of the heavy or light chains are composed of combinations of 
sequences mimicking the sequences in antibodies of different sources, whether these 

1 0 sources be from differing classes or different species of origin, and whether or not the 
fusion point is at the variable/constant boundary. Thus, it is possible to produce 
antibodies in which neither the constant nor the variable region mimic know antibody 
sequences. It then becomes possible, for example, to construct antibodies whose 
variable region has a higher specific affinity for a particular antigen, or whose constant 

1 5 region can elicit enhanced complement fixation, or to make other improvements in 
properties possessed by a particular constant region. 

Another example is "altered antibodies", which refers to antibodies in which the 
naturally occurring amino acid sequence in a vertebrate antibody has been varies. 
Utilizing recombinant DNA techniques, antibodies can be redesigned to obtain desired 

20 characteristics. The possible variations are many, and range from the changing of one 
or more amino acids to the complete redesign of a region, for example, the constant 
region. Changes in the constant region, in general, to attain desired cellular process 
characteristics, e.g., changes in complement fixation, interaction with membranes, and 
other effector functions. Changes in the variable region can be made to alter antigen 

25 binding characteristics. The antibody can also be engineered to aid the specific delivery 
of a molecule or substance to a specific cell or tissue site. The desired alterations can be 
made by known techniques m molecular biology, e.g., recombinant techniques, site- 
directed mutagenesis, etc. 

Yet another example are "univalent antibodies", which are aggregates 

30 comprised of a heavy-chain/hght-chain dimer bound to the Fc (i.e., stem) region of a 
second heavy chain. This type of antibody escapes antigenic modulation. See, e.g., 
Glennie et al. Nature 295: 712 (1982). Included also within the definition of antibodies 
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are "Fab" fragments of antibodies. The "Fab" region refers to those portions of the 
heavy and light chains which are roughly equivalent, or analogous, to the sequences 
which comprise the branch portion of the heavy and Ught chains, and which have been 
shown to exhibit immunological binding to a specified antigen, but which lack the 
5 effector Fc portion. "Fab" includes aggregates of one heavy and one light chain 
(commonly known as Fab*), as well as tetramers containing the 2H and 2L chains 
(referred to as F(ab)2), which are capable of selectively reacting with a designated 
antigen or antigen family. Fab antibodies can be divided into subsets analogous to those 
described above, i.e., "vertebrate Fab", "hybrid Fab", "chimeric Fab", and "altered Fab". 
1 0 Methods of producing Fab fragments of antibodies are known within the art and 
include, for example, proteolysis, and synthesis by recombinant techniques. 

"Antigen-antibody complex" refers to the complex formed by an antibody that 
is specifically bound to an epitope on an antigen. 

"Immunogenic polypeptide" refers to a polypeptide that elicits a cellular and/or 
1 5 humoral immune response in a mammal, whether alone or linked to a carrier, in the 
presence or absence of an adjuvant. 

"Antigenic determinant" refers to the site on an antigen or hapten to which a 
specific antibody molecule or specific cell surface receptor binds. 

As used herein, 'treatment" refers to any of (i) the prevention of infection or 
20 reinfection, as in a traditional vaccine, (ii) the reduction or elimination of symptoms, 

and (iii) the substantial or complete elimination of the pathogen in question. Treatment 
may be effected prophylactically (prior to infection) or ther^eutically (following 
infection). 

By "vertebrate subject" is meant any member of the subphylum cordata, 
25 including, without limitation, humans and other primates, including non-human 

primates such as chimpanzees and other apes and monkey species; farm animals such 
as cattle, sheep, pigs, goats and horses; domestic mammals such as dogs and cats; 
laboratory animals including rodents such as mice, rats and guinea pigs; birds, 
including domestic, wild and game birds such as chickens, turkeys and other 
30 gallinaceous birds, ducks, geese, and the like. The term does not denote a particular 
age. Thus, both adult and newborn individuals are intended to be covered. The 
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invention described herein is intended for use in any of the above vertebrate species, 
since the immune systems of all of these vertebrates operate similarly. 

11. Modes of Carrving out the Invention 

Before describing the present invention in detail, it is to be understood that this 
invention is not limited to particular formulations or process parameters as such may, of 
course, vary. It is also to be understood that the terminology used herein is for the 
purpose of describing particular embodiments of the invention only, and is not intended 
to be limiting. 

Although a number of compositions and methods similar or equivalent to those 
described herein can be used in the practice of the present invention, the preferred 
materials and methods are described herein. 

General Overview 

An aim of an HCV vaccine is to generate broad immunity to a wide breadth of 
antigens because HCV is so divergent and because humoral as well as cellular immune 
responses are desirable to combat this human pathogen. While antibodies generated 
against the envelope glycoprotein(s) might aid in virus neutralization, there is 
additional benefit to be derived from a vaccine that includes other regions. The 
likelihood of T-helper responses generated against a polypeptide would be helpful in a 
vaccine setting as would generation of cytotoxic T cells. The non-structural region 
represents such a candidate antigen, but processing by the protease generates several 
polypeptides, making purification compUcated. It would be advantageous, therefore, to 
derive a non-structural cassette that is unprocessed by the NS3 protease. 

The present invention solves this and other problems using compositions and 
methods involving an N-terminal deletion in NS3, which removes the catalytic domain. 
As such, some or all of the remainder of the non-structural region (through NS5B) is 
expressed as an intact polypeptide. Expression of this species has been documented in 
mammalian cells as well as in yeast. Further, in certain aspects, polynucleotides 
encoding HCV core polypeptides (or fragments thereof) are added (e.g,. operably 
Imked) to the carboxy-terminus of the non-structural cassette. As the core coding 
region is relatively highly conserved among HCV isolates, the presence of this region 
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may enhance the immune response. Because core has at its C-tenninus a very 
hydrophobic domain (amino acids 174-191), shorter versions of core were also 
engineered onto the polypeptide. As described in detail herein, the truncation of core to 
amino acid 121 yielded higher expression than the amino acid 173 truncation when 
5 engineered onto the C-terminus of the mutant NS polypeptide. The combination of 
most of the non-structural region fused to a C-terminally truncated core into a 
polypeptide is novel and has advantages for vaccine immunization. Moreover, because 
the aim is not necessarily to generate antibody responses to this polypeptide, there is no 
need to maintain a native conformation, enabling a more facile purification protocol. 

10 

Mutant HCV Non-Structural Polypeptides 

Genomes of HCV strains contain a single open reading frame of approximately 
9,000 to 12,000 nucleotides, which is transcribed into a polyprotein. An HCV 
polyprotein is cleaved to produce at least ten distinct products, in the order of NHj- 

1 5 Core-El-E2-p7-NS2-NS3-NS4a-NS4b-NS5a-NS5b-COOH. Mutant HCV 

polypeptides of the invention contain an N-terminal deletion in NS3, which removes or 
disables the catalytic domain. Preferably, the polypeptides also include the remamder 
of the non-structural region, although in certain embodiments, the polypeptides may 
include less than all of the remaining NS polypeptides, for example mutant NS 

20 polypeptides including any combinations of NS2-NS3-NS4a-NS4b-NS5a-NS5b (e.g., 
NS3NS3-NS5a-NS5b; NS3-NS4a-NS4b; NS3-NS4a-NS4b-NS5a; NS3.NS4b-NS5a- 
NS5b; NS3-NS4a-NS5a; NS3-NS4b-NS5a; NS3-NS4b-NS5b; etc.). 

The HCV^NS3 protein functions as a protease and a helicase and occurs at 
approximately amino acid 1027 to amino acid 1657 of the polyprotein (numbered 

25 relative to HCV-l). See Choo et al (1991) Proc. Natl. Acad. Sci. USA 88:2451-2455. 
HCV NS4 occurs at approximately amino acid 1658 to amino acid 1972, NS5a occurs 
at approximately amino acid 1973 to amino acid 2420, and HCV NS5b occurs at 
approximately amino acid 2421 to amino acid 301 1 of the polyprotein (numbered 
relative to HCV-l) (Choo et aL, 1991). 

30 The mutant polypeptides described herein can either be full-length polypeptides 

or portions of NS3, NS4 (NS4a and NS4b), NS5a, and NS5b polypeptides. Epitopes of 
NS3, NS4 (NS4a and NS4b), NS5a, NS5b, NS3NS4NS5a, and NS3NS4NS5aNS5b can 
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be identified by several methods. For example, NS3, NS4, NS5a, NS5b polypeptides 
or fusion proteins comprising any combination of the above, can be isolated, for 
example, by immunoafiinity purification using a monoclonal antibody for the 
polypeptide or protein. The isolated protein sequence can then be screened by 
5 preparing a series of short peptides by proteolytic cleavage of the purified protein, 
which together span the entire protein sequence. By starting with, for example, 
lOO-mer polypeptides, each polypeptide can be tested for the presence of epitopes 
recognized by a T cell receptor on an HCV-activated T cell, progressively smaller and 
overlapping fi-agments can then be tested from an identified lOO-mer to map the epitope 
10 of interest. 

Epitopes recognized by a T cell receptor on an HCV-activated T cell can be 
identified by, for example, ^'Cr release assay (see Example 2) or by 
lymphoproliferation assay (see Example 4). In a ^*Cr release assay, target cells can be 
constructed that display the epitope of interest by cloning a polynucleotide encoding the 

1 5 epitope into an expression vector and transforming the expression vector into the target 
cells. Non-structxiral polypeptides can occur in any order in the fiision protein. If 
desired, at least 2, 3, 4, 5, 6, 7, 8, 9, or 10 or more of one or more of the polypeptides 
may occur in the fiision protein. Multiple viral strains of HCV occur, and NS3, NS4, 
NS5a, and NS5b polypeptides of any of these strains can be used in a fiision protein. 

20 Nucleic acid and amino acid sequences of a number of HCV strains and 

isolates, including nucleic acid and amino acid sequences of NS3, NS4, NS5a, NS5b 
genes and polypeptides have been determined. For example, isolate HCV Jl.l is 
described in Kubo et al (1989) Japan. Nucl. Acids Res. 17:10367-10372; Takeuchi et ^ 
a/.(1990) Gene 91:287-291; Takeuchi et al. (1990) J. Gen. Virol. 71:3027-3033; and 

25 Takeuchi et al (1990) Nucl. Acids Res. 18:4626. The complete coding sequences of 
two independent isolates, HCV-J and BK, are described by Kato et al, (1990) Proc. 
Natl. Acad. Sci. USA 87:9524-9528 and Takamizawa et al, (1991) J. Virol. 
65 : 1 105-1 1 1 3 respectively. 

Publications that describe HCV-1 isolates include Choo et al (1990) Brit. Med. 

30 Bull. 46:423-441; Choo et al (1991) Proc. Natl. Acad. Sci. USA 88:2451-2455 and 
HaneM/. (1991) Proc. Natl. Acad. Sci. USA 88:1711-1715. HCV isolates HC-Jl and 
HC-J4 are described in Okamoto et al (1991) Japan J. Exp. Med. 60:167-177. HCV 
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isolates HCT 18-, HCT 23, Th, HCT 27, ECl and EClO are described in Weiner et al. 
(1991) Virol. 180:842-848. HCV isolates Pt-1, HCV-Kl and HCV-K2 are described in 
Enomoto et al. (1990) Biochem. Biophys. Res. Commun. 170:1021-1025. HCV 
isolates A, C, D & E are described in Tsukiyama-Kohara et al (1991) Virus Genes 
5 5:243-254. 

Each of the mutant HCV polypeptides containing at least portions of NS3, NS4 
and NS5 can be obtained from the same HCV strain or isolate or from different HCV 
strains or isolates. Thus, each non-structural region of the polypeptide can be from the 
same HCV strain or isolate or from each different HCV strains or isolates. In addition 

10 to the mutant HCV non-structural polypeptides described herein, the proteins can 

contain other polypeptides derived from the HCV polyprotein. For example, it may be 
desirable to include polypeptides derived from the core region of the HCV polyprotein. 
This region occurs at amino acid positions 1-191 of the HCV polyprotein, numbered 
relative to HCV-1 . Either the fiill-length protein or epitopes of the full-length protein 

15 may be used in the subject fusions, such as those epitopes found between amino acids 
10-53, amino acids 10-45, amino acids 67-88, amino acids 120-130, or any of the core 
epitopes identified in, e.g., Houghton et al., U.S. Patent No. 5,350,671; Chien et al., 
Proc. Natl Acad. Set USA (1992) 89:1001 1-10015; Chien et al., J, Gastroent. Hepatol 
(1993) 8:S33-39; Chien et al., International Publication No. WO 93/00365; Chien, 

20 D.Y., International Publication No. WO 94/01778; and commonly owned, U.S. Patent 
No. 6,150,087. When present, additional non-structural HCV polypeptides such as core 
can be obtained from the same HCV strain or isolate or from different HCV strains or 
isolates. 

Preferably, the above-described mutant proteins, as well as the individual 
25 components of these proteins, are produced recombinantly, A polynucleotide encoding 
these proteins can be introduced into an expression vector which can be expressed in a 
suitable expression system. A variety of bacterial, yeast, mammalian, insect and plant 
expression systems are available in the art and any such expression system can be used. 
Optionally, a polynucleotide encoding these proteins can be translated in a cell-free 
30 translation system. Such methods are well known in the art. The proteins also can be 
constructed by solid phase protein synthesis. 
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If desired, the mutant polypeptides, or the individual components of these 
polypeptides, also can contain other amino acid sequences, such as amino acid linkers 
or signal sequences, as well as ligands useful in protein purification, such as 
glutathione-S-transferase and staphylococcal protein A. 

5 

Polynucleotides 

The polynucleotides of the present invention are not necessarily physically 
derived from the nucleotide sequences shown, but can be generated in any manner, 
including, for example, chemical synthesis or DNA repUcation or reverse transcription 
10 or transcription. In addition, combinations of regions corresponding to that of the 

designated sequences can be modified in ways known to the art to be consistent with an 
intended use. 

The DNA encoding the desired polypeptide, whether in fiised or mature form, 
and whether or not containing a signal sequence to permit secretion, can be ligated into 

15 expression vectors suitable for any convenient host. Both eukaryotic and prokaryotic 
host systems are presently used in forming recombinant polypeptides, and a summary 
of some of the more common control systems and host cell is given below. The 
polypeptide produced in such host cells is then isolated from lysed cells or from the 
culture medium and purified to the extent needed for its intended use. 

20 Purification can be by techniques known in the art, for example, differential 

extraction, salt fractionation, chromatography on ion exchange resins, affmity 
chromatography, centrifugation, alkaU resolubilization of insoluble protein, and the 
V like. See, for example. Methods in Enzymology for a variety of methods for purifying 
proteins. 

25 Polynucleotides contain less than an entire HCV genome and can be RNA or 

single- or double-stranded DNA. Preferably, the polynucleotides are isolated free of 
other components, such as proteins and hpids. Polynucleotides of the invention can 
also comprise other nucleotide sequences, such as sequences coding for linkers, signal 
sequences, or ligands usefiil in protein purification such as glutathione-S-transferase 

30 and staphylococcal protein A. 

Polynucleotides encoding mutant HCV non-structural polypeptides can be 
isolated from a genomic library derived from nucleic acid sequences present in, for 
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example, the plasma, serum, or liver homogenate of an HCV infected individual or can 
be synthesized in the laboratory, for example, using an automatic synthesizer. An 
amplification method such as PGR can be used to amplify polynucleotides fiom either 
HCV genomic DNA or cDNA. 

Further, while the polypeptides that are not NS3, NS4, or NS5 of HCV of the 
present invention can comprise a substantially complete viral domain, in many 
applications all that is required is that the polypeptide comprise an antigenic or 
immunogenic region of the virus. An antigenic region of a polypeptide is generally 
relatively small-typically 8 to 10 amino acids or less in length. Fragments of as few as 5 
amino acids can characterize an antigenic region. These segments can correspond to 
regions of, for example, C, El, or E2 epitopes. Accordmgly, using the cDNAs of C, El, 
or E2 as a basis, DNAs encoding short segments of C, El, or E2 polypeptides can be 
expressed recombinantly either as fusion proteins, or as isolated polypeptides. In 
addition, short amino acid sequences can be conveniently obtained by chemical 
synthesis. 

Polynucleotides encoding the polypeptides described herein can comprise 
coding sequences for these polypeptides which occur naturally or can be artificial 
sequences which do not occur in nature. These polynucleotides can be ligated to form a 
coding sequence for the fusion proteins using standard molecular biology techniques. 
If desired, polynucleotides can be cloned into an expression vector and transformed 
into, for example, bacterial, yeast, insect, plant or mammalian cells so that the fusion 
proteins of the invention can be expressed in and isolated from a cell culture. 

The expression of polypeptides containing these domains in a variety of 
recombinant host cells, including, for example, bacteria, yeast, insect, plant and 
vertebrate cells, give rise to important immunological reagents which can be used for 
diagnosis, detection, and vaccines. 

The general techniques used in extracting the genome from a virus, preparing 
and probing a cDNA library, sequencing clones, constructing expression vectors, 
transforming cells, performing immunological assays such as radioimmunoassays and. 
ELIS A assays, for growing cells in culture, and the like are known in the art and 
laboratory manuals are available describing these techniques. However, as a general 
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guide, the following sets forth some sources currently available for such procedures, 
and for materials useful in carrying them out. 

Both prokaryotic and eukaryotic host cells may be used for expression of 
desired coding sequences when appropriate control sequences which are compatible 
5 with the designated host are used. Among prokaryotic hosts, E. coli is most frequently 
used. Expression control sequences for prokaryotes include promoters, optionally 
containing operator portions, and ribosome binding sites. Transfer vectors compatible 
with prokaryotic hosts are commonly derived from, for example, pBR322, a plasmid 
containing operons conferring ampicillin and tetracycline resistance, and the various 

10 pUC vectors, which also contain sequences conferring antibiotic resistance markers. 

These markers may be used to obtain successful transformants by selection. Commonly 
used prokaryotic control sequences include the Beta-lactamase (penicillinase) and 
lactose promoter systems (Chang et al. (1977), Nature 198:1056), the tryptophan (trp) 
promoter system (Goeddel et al. (1980) Nucleic Acid Res. 8:4057), the lambda-derived 

15 P[L ]promoter and N gene ribosome binding site (Shimatake et al. (1981) Nature 
292:128) and the hybrid tac promoter (De Boer et al. (1983) Proc. Natl. Acad. Sci. 
U.S.A. 292:128) derived from sequences of the trp and lac UV5 promoters. The 
foregoing systems are particularly compatible with E. coli; if desired, other prokaryotic 
hosts such as strains of Bacillus or Pseudomonas may be used, with corresponding 

20 control sequences. 

Eukaryotic hosts include mammalian and yeast cells in culture systems. 
Mammalian cell lines available as hosts for expression are known in the art and include 
many immortalized cell lines available from the American Type Culture Collection 
(ATCC), including HeLa cells, Chinese hamster ovaiy (CHO) cells, baby hamster 

25 kidney (BHK) cells, and a number of other cell lines. Suitable promoters for 

mammalian cells are also known in the art and include viral promoters such as that 
from Simian Virus 40 (SV40) (Piers (1978), Nature 273: 1 13), Rous sarcoma virus 
(RSV), adenovirus (ADV), and bovine papillomavirus (BPV). Mammalian cells may 
also require terminator sequences and poly A addition sequences; enhancer sequences 

30 which increase expression may also be included, and sequences which cause 

amplification of the gene may also be desirable. These sequences are known in the art. 
Vectors suitable for replication in mammalian cells may include viral replicons, or 
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sequences which insure integration of the appropriate sequences encoding NANBV 
epitopes into the host genome. 

The vaccinia virus system can also be used to express foreign DNA in 
mammalian cells. To express heterologous genes, the foreign DNA is usually inserted 
5 into the thymidine kinase gene of the vaccinia virus and then infected cells can be 
selected. This procedure is known in the art and further information can be found in 
these references (Mackett et al. J. Virol. 49: 857-864 (1984) and Chapter 7 in DNA 
Cloning, Vol. 2, IRL Press). 

Yeast expression systems are also known to one of ordinary skill in the art. A 

10 yeast promoter is any DNA sequence capable of binding yeast RNA polymerase and 
initiating the downstream (3*) transcription of a coding sequence (e.g., stmctural gene) 
into mRNA. A promoter will have a transcription initiation region which is usually 
placed proximal to the 5* end of the coding sequence. This transcription initiation 
region usually includes an RNA polymerase binding site (the "TATA Box") and a 

1 5 transcription initiation site. A yeast promoter may also have a second domain called an 
upstream activator sequence (UAS), which, if present, is usually distal to the structural 
gene. The UAS permits regulated (inducible) expression. Constitutive expression 
occurs in the absence of a UAS. Regulated expression may be either positive or 
negative, thereby either enhancing or reducing transcription. 

20 Yeast is a fermenting organism with an active metabolic pathway, therefore 

sequences encoding enzymes in the metabolic pathway provide particularly useful 
promoter sequences. Examples include alcohol dehydrogenase (ADH) (EP-A-0 284 
044), enolase, glucokinase, glucose-6-phosphate isomerase, glyceraldehyde-3- 
phosphate-dehydrogenase (GAP or GAPDH), hexokinase, phosphofiuctokinase, 3- 

25 phosphoglycerate mutase, and pyruvate kinase (PyK) (EPO-A-0 329 203). The yeast 
PH05 gene, encoding acid phosphatase, also provides useful promoter sequences 
(Myanoharaer^^/. (1983) Proc. Natl Acad. ScL USA 80:1), 

In addition, synthetic promoters which do not occur in nature also function as 
yeast promoters. For example, UAS sequences of one yeast promoter may be joined 

30 with the transcription activation region of another yeast promoter, creating a synthetic 
hybrid promoter. Examples of such hybrid promoters include the ADH regulatory 
sequence linked to the GAP transcription activation region (US Patent Nos. 4,876,197 
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and 4,880,734). Other examples of hybrid promoters include promoters which consist 
of the regulatory sequences of either the ADH2y GAL4, GALIO, OR PH05 genes, 
combined with the transcriptional activation region of a glycolytic enzyme gene such as 
GAP or PyK (EP-A-0 164 556). Furthermore, a yeast promoter can include naturally 
5 occurring promoters of non-yeast origin that have the ability to bind yeast RNA 

polymerase and initiate transcription. Examples of such promoters include, inter aliUy 
(Cohen e/fl/. (1980) Proc. Natl. Acad, Sci. iy&4 77:1078; Henikofife/ a/. (1981) 
Wa^wre 253:835; Hollenberge^ a/. (1981) Cwrr. Topics Microbiol Immunol 96\\\9\ 
Hollenberg et al (1979) "The Expression of Bacterial Antibiotic Resistance Genes in 

10 the Yeast Saccharomyces cerevisiae," in: Plasmids of Medical Environmental and 

Commercial Importance {^s, K.N. Timmis.andA. Puhler); Mercerau-Puigalon a/. 
(1980) Ge«e 77:163; Panthierer^/. (1980) Cwrr. Genet, 2:109). 

A DNA molecule may be expressed intracellularly in yeast. A promoter 
sequence may be directly linked with the DNA molecule, in which case the first amino 

1 5 acid at the N-terminus of the recombinant protein will always be a methionine, which is 
encoded by the ATG start codon. K desired, methionine at the N-terminus may be 
cleaved from the protein by in vitro incubation with cyanogen bromide. 

Fusion proteins provide an alternative for yeast expression systems, as well as 
in mammalian, baculovirus, and bacterial expression systems. Usually, a DNA 

20 sequence encoding the N-temiinal portion of an endogenous yeast protein, or other 
stable protein, is fused to the 5' end of heterologous coding sequences. Upon 
expression, this construct wiU provide a fusion of the two amino acid sequences. For 
a ..example, the yeast or human superoxide dismutase (SOD) gene, can be linked at the 5' 
terminus of a foreign gene and expressed in yeast. The DNA sequence at the junction 

25 of the two amino acid sequences may or may not encode a cleavable site. See e.g., EP- 
A-0 196 056. Another example is a ubiquitin fusion protein. Such a fusion protein is 
made with the ubiquitin region that preferably retains a site for a processing enzyme 
{e.g., ubiquitin-specific processing protease) to cleave the ubiquitin from the foreign 
protein. Through this method, therefore, native foreign protein can be isolated (e.g., 

30 WO88/024066). 

Altematively, foreign proteins can also be secreted from the cell into the grov^h 
media by creating chimeric DNA molecules that encode a fusion protein comprised of a 
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leader sequence fragment that provide for secretion in yeast of the foreign protein. 
Preferably, there are processing sites encoded between the leader fragment and the 
foreign gene that can be cleaved either in vivo or in vitro. The leader sequence 
fragment usually encodes a signal peptide comprised of hydrophobic amino acids 
5 which direct the secretion of the protein from the cell. 

DNA encoding suitable signal sequences can be derived from genes for secreted 
yeast proteins, such as the yeast mvertase gene (EP-A-0 012 873; JPO. 62,096,086) 
and the A-factor gene (US patent 4,588,684). Alternatively, leaders of non-yeast 
origin, such as an interferon leader, exist that also provide for secretion in yeast (EP-A- 
10 0 060 057). 

A preferred class of secretion leaders are those that employ a fragment of the 
yeast alpha-factor gene, which contains both a "pre" signal sequence, and a "pro" 
region. The types of alpha-factor fragments that can be employed include the full- 
length pre-pro alpha factor leader (about 83 amino acid residues) as well as truncated 

15 alpha- factor leaders (usually about 25 to about 50 amino acid residues) (US Patents 
4,546,083 and 4,870,008; EP-A-0 324 274). Additional leaders employing an alpha- 
factor leader fragment that provides for secretion include hybrid alpha-factor leaders 
made with a presequence of a first yeast, but a pro-region from a second yeast 
alphafactor. (e.g., see WO 89/02463.) 

20 Usually, transcription termination sequences recognized by yeast are regulatory 

regions located 3* to the translation stop codon, and thus together with the promoter 
flank the coding sequence. These sequences direct the transcription of an mRNA 
which can be translated into the polypeptide encoded by the DNA. Examples of 
transcription terminator sequence and other yeast-recognized termination sequences, 

25 such as those coding for glycolytic enzymes. 

Usually, the above described components, comprising a promoter, leader (if 
desired), coding sequence of interest, and transcription termination sequence, are put 
together into expression constructs. Expression constructs are often maintained in a 
replicon, such as an extrachromosomal element (e.g., plasmids) capable of stable 

30 maintenance in a host, such as yeast or bacteria. The replicon may have two replication 
systems, thus allowing it to be maintained, for example, in yeast for expression and in a 
prokaryotic host for cloning and amplification. Examples of such yeast-bacteria shuttle 
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vectors include YEp24 (Botstein et al (1979) Gene 5:17-24), pCl/1 (Brake et aL 
(1984) Proc. Natl Acad. Sci USA 57:4642-4646), and YRpl7 (Stinchcomb et al 
(1982)/. Mol Biol 158:157). In addition, a replicon may be either a high or low 
copy number plasmid. A high copy number plasmid will generally have a copy number 
5 ranging from about 5 to about 200, and usually about 10 to about 150. A host 

containing a high copy number plasmid will preferably have at least about 10, and more 
preferably at least about 20. Enter a high or low copy number vector may be selected, 
depending upon the effect of the vector and the foreign protein on the host. See e.g.. 
Brake et al, supra. 

10 Alternatively, the expression constructs can be integrated into the yeast genome 

with an integrating vector. Integraticig^vectors usually contain at least one sequence 
homologous to a yeast chromosome that allows the vector to integrate, and preferably 
contain two homologous sequences flanking the expression construct. Integrations 
appear to result from recombinations between homologous DNA in the vector and the 

15 yeast chromosome (Orr- Weaver a/. il9S3) Methods in Enzymol 707:228-245). An 
integrating vector may be directed to a specific locus in yeast by selecting the 
appropriate homologous sequence for inclusion in the vector. See Orr- Weaver et al, 
supra. One or more expression construct may integrate, possibly affecting levels of 
recombinant protein produced (Rine a/. (1983) Proc. Natl Acad, Sci. USA 

20 50:6750). The chromosomal sequences included in the vector can occur either as a 

single segment in the vector, which results in the integration of the entire vector, or two 
segments homologous to adjacent segments in the chromosome and flanking the 
expression construct in the vector, which can result in the stable integration of only the 
expression construct. 

25 Usually, extrachromosomal and integrating expression constructs may contain 

selectable markers to allow for the selection of yeast strauis that have been transformed. 
Selectable markers may include biosynthetic genes that can be expressed in the yeast 
host, such as ADE2, HIS4, LEU2, TRPl, and ALG7, and the G41 8 resistance gene, 
which confer resistance in yeast cells to tunicamycin and G418, respectively. In 

30 addition, a suitable selectable marker may also provide yeast with the ability to grow in 
the presence of toxic compounds, such as metal. For example, the presence of CUPI 
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allows yeast to grow in the presence of copper ions (Butt et al (1987) Microbiol Rev, 
J/:351), 

Alternatively, some of the above described components can be put together into 
transformation vectors. Transformation vectors are usually comprised of a selectable 
5 marker that is either maintained in a replicon or developed into an integrating vector, as 
described above. 

Expression and transformation vectors, either extrachromosomal repUcons or 
integrating vectors, have been developed for transformation into many yeasts. For 
example, expression vectors have been developed for, inter alia, the following yeasts: 

10 Candida albicans (Kurtz, fl/. (1986) Mo/. Cell Biol 6:142), Candida maltosa 

(Kunze, e/a/. (1985)7. Basic Microbiol 25:141). Hansenula polymorpha (Gleeson, 
etal (1986) J. Gen, Microbiol 732:3459; Roggenkamp a/. (1986) Mo/. Gen, 
Genet, 202:302), Kluyveromyces fragilis (Das, a/. (1984) J. Bacteriol 755:1165), 
Kluyveromyces lactis (De Louvencourt a/. (1983)7. Bacteriol 154:737; Wan den 

15 BcrgetaL ( 1 990) 5/o/7ecAno/ogy 5:1 35), Pichiaguillerimondii (Kunze e/a7. (1985) 
Basic Microbiol 25:141), Pichiapastoris (Cregg, a/. (1985) Mo/. Cell Biol 
5:3376; US Patent Nos. 4,837,148 and 4,929,555), Saccharomyces cerevisiae (Hinnen 
etal (1978) Proc. Natl Acad, ScL USA 75:1929; Ito et al (1983) J. Bacteriol 
753:163), Schizosaccharomyces pombe (Beach and Nurse (1981) Nature 300:706), and 

20 Yarrowia Upolytica (Davidow, e/ a/. (1985) Cwrr. Genet, 70:380471 Gaillardin, e/a/. 
(1985) Cwrr. Genel 70:49). 

Methods of introducing exogenous DNA into yeast hosts are well-knovra in the 
art, and usually include either the transformation of spheroplasts or of intact yeast cells 
treated with alkali cations. Transformation procedures usually vary with the yeast 

25 species to be transformed. (See e.^., Kurtz a/. (1986) Mo/. Cell Biol 6:142; 
Kunze etal (1985)7. Basic Microbiol 25:141; Candida; Gleeson e/a/. (1986) J. 
Gen, Microbiol 732:3459; Roggenkamp e/ a/. (1986) Mo/. Gen, Genet, 202:302; 
Hansenula; Das a/. (1984)/. Bacteriol 755:1165; De Louvencourt e/a/. (1983)/. 
Bacteriol 754:1 165; Van den Berg et al (1990) Bio/Technology 5:135; 

30 Kluyveromyces; Cregg e/ a/. (1985) Mo/. Cell Biol 5:3376; Kunze e/ a/. (1985)/. 
Basic Microbiol 25:141; US Patent Nos. 4,837,148 and 4,929,555; Pichia; Hinnen et 
al (1978) Proc. Natl Acad, ScL USA 75;1929; Ito et al (1983)/. Bacteriol 
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753:163 Saccharomyces; Beach and Nurse (1981) Nature 300:706; 
Schizosaccharomyces; Davidow a/. (1985) Cwrr. Genet 70:39; Gaillardine/a/. 
(1985) C«rr. Genet 70:49; Yarrowia). 

Bacterial expression techniques are known in the art. A bacterial promoter is 
5 any DNA sequence capable of binding bacterial RNA polymerase and initiating the 
downstream (3') transcription of a coding sequence (e.g., structural gene) into mRNA. 
A promoter will have a transcription initiation region which is usually placed proximal 
to the 5* end of the coding sequence. This transcription initiation region usually 
includes an RNA polymerase binding site and a transcription initiation site. A bacterial 

10 promoter may also have a second domain called an operator, that may overlap an 

adjacent RNA polymerase binding site at which RNA synthesis begins. The operator 
permits negative regulated (inducible) transcription, as a gene repressor protein may 
bind the operator and thereby inhibit transcription of a specific gene. Constitutive 
expression may occur in the absence of negative regulatory elements, such as the 

1 5 operator. In addition, positive regulation may be achieved by a gene activator protein 
binding sequence, which, if present is usually proximal (5') to the RNA polymerase 
binding sequence. An example of a gene activator protein is the catabolite activator 
protein (CAP), which helps initiate transcription of the lac operon in Escherichia coli 
(E. coli) (Raibaud a/. (19S4) Annu. Rev. Genet 75:173). Regulated expression 

20 may therefore be either positive or negative, thereby either enhancing or reducing 
transcription. 

Expression and transformation vectors, either extra-chromosomal replicons or 
integrating vectors, have been developed for transformation into many bacteria. For 
example, expression vectors have been developed for, inter alia, the following bacteria: 

25 Bacillus subtilis(Palvae^ a/. (19S2) Proc. Natl Acad. ScL 79:5582; EP-A-0 
036 259 and EP-A-0 063 953; WO 84/04541), Escherichia coU (Shimatake et al 
{\9%\) Nature 292:\2^\Am2innetal (1985) Ge«e ^0:183; Studiere/ a/. (1986)/. 
Mol Biol 75P:1 13; EP-A-0 036 776,EP-A-0 136 829 and EP-A-0 136 907), 
Streptococcus cremoris (Powell a/. {\9U)Appl Environ, Microbiol 54:655); 

30 Streptococcus lividans (Powell e/ a/. {\9U)Appl Environ. Microbiol 54:655), 
Streptomyces lividans (US patent 4,745,056). 
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Methods of introducing exogenous DNA into bacterial hosts are well-known in 
the art, and usually include either the transformation of bacteria treated with CaCl2 or 
other agents, such as divalent cations and DMSO. DNA can also be introduced into 
bacterial cells by electroporation. Transformation procedures usually vary with the 
5 bacterial species to be transformed. (See e.g., Masson et al (1989) FEMS Microbiol 
Lett 60:273;mvaetaL (1982) Proa Natl Acad Sci. j7&4 7P:5582; EP-A-0 036 
259 and EP-A-0 063 953; WO 84/04541, Bacillus, Miller et al (1988) Proc. Natl 
Acad. Scu 55:856; Wang a/. (1990) J. Bacteriol / 72:949; Campylobacter, Cohen 
etal (1973) Prac. Natl Acad. ScL dP:2110; Dower e/a/. (198S) Nucleic Acids Res, 

10 75:6127; Kushner (1978) "An improved method for transformation of Escherichia coli 
with ColEl -derived plasmids. In Genetic Engineering: Proceedings of the 
International Symposium on Genetic Engineering (cds, H.W. BoyerandS. Nicosia); 
Mwdcletal (1970) J. Mol Biol 55:159; Taketo (1988) 5/ocA//w. Biophys. Acta 
P^P:318; Escherichia; Chassye/ a/. (1981) FEMS Microbiol Lett 44:173 

15 Lactobacillus; Fiedler et al (1988) Anal Biochem 170:38, Pseudomonas; Augustin et 
al (1990) FEMS Microbiol Lett 5(5:203, Staphylococcus, Barany a/. (1980)/. 
Bacteriol 144:698; Harlander (1987) "Transformation of Streptococcus lactis by 
electroporation, in: Streptococcal Genetics (ed. J. Ferretti and R. Curtiss III); Perry et 
al (1981) Infect Immun, 32:1295; Powell a/. (\988) Appl Environ, Microbiol 

20 5¥:655;Somkutie^a/. (1987)Proc. 4th Evk Cong. Biotechnology 1:412, 
Streptococcus). 

In addition, viral antigens can be expressed in insect cells by the Baculovirus 
system. A general guide to Baculovims expression by Summer and Smith is A Manual 
of Methods for Baculovirus Vectors and Insect Cell Culture Procedures (Texas 

25 Agricultural Experiment Station Bulletin No. 1555). To incorporate the heterologous 
gene into the Baculovirus genome the gene is first cloned into a transfer vector 
containing some Baculovirus sequences. This transfer vector, when it is cotransfected 
with wild-type virus into insect cells, will recombine with the wild-type virus. Usually, 
the transfer vector will be engineered so that the heterologous gene will disrupt the 

30 wild-type Baculovirus polyhedron gene. This disruption enables easy selection of the 
recombinant virus since the cells infected with the recombinant virus will appear 
phenotypically different from the cells infected with the wild-type virus. The purified 
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recombinant virus can be used to infect cells to express the heterologous gene. The 
foreign protein can be secreted into the medium if a signal peptide is linked in frame to 
the heterologous gene; otherwise, the protein will be bound in the cell lysates. For 
further information, see Smith et al Mol. & Cell. Biol. 3:2156-2165 (1983) or Luckow 
5 and Summers in Virology 17: 31-39 (1989), 

Baculovirus expression can also be affected in plant cells. There are many plant 
cell culture and whole plant genetic expression systems known in the art. Exemplary 
plant cellular genetic expression systems include those described in patents, such as: 
US 5,693,506; US 5,659,122; and US 5,608,143. Additional examples of genetic 

10 expression in plant cell culture has been described by Zenk, Phytochemistry 30:3861- 
3863 (1991). Descriptions of plant protein signal peptides may be found in addition to 
the references described above in Vaulcombe et al.. Mo/. Gen. GeneL 209:33-40 
(1987); Chandler et al., Plant Molecular Biology 3:407-418 (1984); Rogers, /. Biol. 
Chem. 260:3731-3738 (1985); Rothstein et al.. Gene 55:353-356 (1987); Whittier et 

15 al, Nucleic Acids Research 15:2515-2535 (1987); Wirsel et al, Molecular 

Microbiology 3:3-14 (1989); Yu et al.. Gene 122:247-253 (1992). A description of the 
regulation of plant gene expression by the phytohormone, gibberellic acid and secreted 
enzymes induced by gibberellic acid can be found in R.L. Jones and J. MacMillin, 
Gibberellins: in: Advanced Plant Physiology,. Malcohn B. Wilkins, ed., 1984 Pitman 

20 PubHshing Limited, London, pp. 21-52. References that describe other metabolically- 
regulated genes: Sheen, Plant Cell, 2:1027-1038(1990); Maas et al., EMBOJ, 9:3447- 
3452 (1990); Benkel and Hickey,Proc. Natl Acad. Sci, 84:1337-1339(1987). 

, „ _ All plants from which protoplasts can be isolated and cultured to give whole 

regenerated plants can be transformed by the present invention so that whole plants are 

25 recovered which contain the transferred gene. It is known that practically all plants can 
be regenerated from cultured cells or tissues, including but not limited to all major 
species of sugarcane, sugar beet, cotton, fruit and other trees, legumes and vegetables. 
Some suitable plants include, for example, species from the genera Fragaria^ Lotus, 
Medicago, Onobrychis, Trifolium, Trigonella, Vigna, Citrus, Linum, Geranium, 

30 Manihot, Daucus, Arabidopsis, Brassica, Raphanus, Sinapis, Atropa, Capsicum, 

Datura, Hyoscyamus, Lycopersion, Nicotiana, Solanum, Petunia, Digitalis, Majorana, 
Cichorium, Helianthus, Lactuca, Bromus, Asparagus, Antirrhinum, Hererocallis, 
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Nemesia, Pelargonium, Panicum, Pennisetum, Ranunculus^ Senecio, Salpiglossis, 
Cucumis, Browaalia, Glycine^ Loliuniy Zea, Triticum, Sorghum, and Datura, 

Transformation can be by any method for introducing polynucleotides into a 
host cell, including, for example packaging the polynucleotide in a virus and 
5 transducing a host cell with the virus, and by direct uptake of the polynucleotide. The 
transformation procedure used depends upon the host to be transformed. Bacterial 
transformation by direct uptake generally employs treatment with calcium or rubidium 
chloride (Cohen (1972), Proc. Natl. Acad. Sci. U.S.A. 69:2110; Maniatis et al. (1982), 
MOLECULAR CLONING; A LABORATORY MANUAL (Cold Spring Harbor Press, 

10 Cold Spring Harbor, N.Y.). Yeast transformation by direct uptake may be carried out 
using the method . of Hinnen et al. (1978) Proc. Natl. Acad. Sci. U.S.A. 75: 1929. 
Mammalian transformations by direct uptake may be conducted using the calcium 
phosphate precipitation method of Graham and Van der Eb (1978), Virology 52:546 or 
the various known modifications thereof. 

15 Vector construction employs techniques which are known in the art. Site- 

specific DNA cleavage is performed by treating with suitable restriction enzymes under 
conditions which generally are specified by the manufacturer of these commercially 
available enzymes. The cleaved fi*agments may be separated using polyacrylamide or 
agarose gel electrophoresis techniques, according to the general procedures found in 

20 Methods in Enzymology (1980) 65:499-560. Sticky ended cleavage Augments may be 
blunt ended using E. coli DNA polymerase I (Klenow) in the presence of the 
appropriate deoxynucleotide triphosphates (dNTPs) present in the mixture. Treatment 
with SI nuclease may also be used, resulting in the hydrolysis of any single stranded 
DNA portions. 

25 Ligations are carried out using standard buffer and temperature conditions using 

T4 DNA ligase and ATP; sticky end ligations require less ATP and less ligase than 
blunt end ligations. When vector fragments are used as part of a ligation mixture, the 
vector fragment is often treated with bacterial alkaline phosphatase (BAP) or calf 
intestinal alkaline phosphatase to remove the 5'-phosphate and thus prevent religation 

30 of the vector, alternatively, restriction enzyme digestion of unwanted Augments can be 
used to prevent tigation. Ligation mixtures are transformed into suitable cloning hosts, 
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such as E. coli, and successful transformants selected by, for example, antibiotic 
resistance, and screened for the correct construction. 

Synthetic oligonucleotides may be prepared using an automated oligonucleotide 
synthesizer as described by Warner (1984), DNA 3:401. If desired, the synthetic strands 
5 may be labeled with ^^P by treatment with polynucleotide kinase in the presence of ^^P- 
ATP, using standard conditions for the reaction. DNA sequences, including those 
isolated from cDNA libraries, may be modified by known techniques, including, for 
example site directed mutagenesis, as described by Zoller (1982), Nucleic Acids Res. 
10:6487. 

10 The expression constructs of the present invention, including the desired fusion, 

or individual expression constructs comprising the individual components of these 
fusions, may be used for nucleic acid immunization, to activate HC V-specific T cells, 
using standard gene delivery protocols. Methods for gene delivery are known in the 
art. See, e.g., U.S. Patent Nos. 5,399,346, 5,580,859, 5,589,466. Genes can be 

1 5 delivered either directly to the vertebrate subject or, alternatively, delivered ex vivo, to 
cells derived from the subject and the cells reimplanted in the subject. For example, the 
constructs can be delivered as plasmid DNA, e.g., contained within a plasmid, such as 
pBR322, pUC, orColEl 

Additionally, the expression constructs can be packaged in liposomes prior to 

20 delivery to the cells. Lipid encapsulation is generally accomplished using liposomes 
which are able to stably bind or entrap and retain nucleic acid. The ratio of condensed 
DNA to Upid preparation can vary but will generally be around 1:1 (mg 
DNA:micromoles lipid), or more of lipid. For a review of the use of liposomes as 
carriers for delivery of nucleic acids, see, Hug and Sleight, Biochim. Biophys. Acta. 

25 (1991) 1097:1-17; Straubinger et zU 'xaMethods ofEnzymology (1983), Vol 101, pp. 
512-527. 

Liposomal preparations for use with the present invention include cationic 
(positively charged), anionic (negatively charged) and neutral preparations, with 
cationic liposomes particularly preferred. Cationic liposomes are readily available. For 
30 example, N[l-2,3-dioleyloxy)propyl]-N,N,N.triethylanunonium (DOTMA) liposomes 
are available under the trademark Lipofectin, from GIBCO BRL, Grand Island, NY. 
(See, also. Feigner et al., Proc, Natl Acad. Sci. USA (1987) 84:7413-7416), Other 
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commercially available lipids include transfectace (DDAB/DOPE) and DOTAP/DOPE 
(Boerhinger). Other cationic liposomes can be prepared jfrom readily available 
materials using techniques well known in the art. See, e.g., Szoka et al. Proa Natl 
Acad, Set USA (1978) 75:4194-4198; PCX Publication No. WO 90/1 1092 for a 
5 description of the synthesis of DOTAP ( 1 ,2-bis(oleoyloxy)-3- 

(trimethylammonio)propane) liposomes. The various liposome-nucleic acid complexes 
are prepared using methods known in the art. See, e.g., Straubinger et al., in 
METHODS OF IMMUNOLOGY (1983), Vol. 101, pp. 512-527; Szoka et al., Proc. 
Natl Acad. Set USA (1978) 75:4194-4198; Papahadjopoulos et al, Biochim, Biophys. 

10 Acta (1975) 394:483; Wilson et al., Cell (1979) 17:77); Deamer and Bangham, 

Biochim, Biophys. Acta (1976) 443:629; Ostro et al„ Biochem. Biophys. Res. Commun, 
(1977) 76:836; Fraley et al.. Proa Natl Acad. ScL USA (1979) 76:3348); Enoch and 
Strittmatter, Proc, Natl Acad. Scl USA (1979) 76:145); Fraley et al., J. Biol Chem. 
(1980) 255:10431; Szoka and Papahadjopoulos, Proc. Natl Acad. ScL USA (1978) 

15 75:145; and Schaefer-Ridder et al.. Science (1982) 215:166. 

The DNA can also be delivered in cochleate lipid compositions similar to those 
described by Papahadjopoulos et al., Biochem. Biophys. Acta. (1975) 394:483-491. 
See, also, U.S. Patent Nos. 4,663,161 and 4,871,488. 

A number of viral based systems have been developed for gene transfer into 

20 mammalian cells. For example, retroviruses provide a convenient platform for gene 
delivery systems, such as murine sarcoma virus, mouse mammary tumor virus, 
Moloney murine leukemia virus, and Rous sarcoma virus. A selected gene can be 
inserted into a vector and packaged in retroviral particles using techniques known in the 
art. The recombinant virus can then be isolated and delivered to cells of the subject 

25 either in vivo or ex vivo. A number of retroviral systems have been described (U.S. 
Patent No. 5,219,740; Miller and Rosman, BioTechniques (1989) 7:980-990; Miller, 
A.D., Human Gene Jlierapy (1990) 1:5-14; Scarpa et al.. Virology (1991) 180:849-852; 
Bums et al., Proc. Natl Acad. ScL USA (1993) 90:8033-8037; and Boris-Lawrie and 
Temin, Cur. Opin. Genet. Develop. (1993) 3:102-109. Briefly, retroviral gene delivery 

30 vehicles of the present invention may be readily constructed from a wide variety of 
retroviruses, including for example, B, C, and D type retroviruses as well as 
spumaviruses and lentiviruses such as FIV, mv, fflV-1, HIV-2 and SIV (see RNA 



-42- 



wo 01/38360 



PCTAJSOO/32326 



Tumor Viruses, Second Edition, Cold Spring Harbor Laboratory, 1985). Such 
retroviruses may be readily obtained from depositories or collections such as the 
American Type Culture Collection ("ATCC"; 10801 University Blvd., Manassas, VA 
201 10-2209), or isolated from known sources using commonly available techniques. 
5 A number of adenovirus vectors have also been described, such as adenovirus 

Type 2 and Type 5 vectors. Unlike retroviruses which integrate into the host genome, 
adenoviruses persist extxachromosomally thus minimizing the risks associated with 
insertional mutagenesis (Haj-Ahmad and Graham, J, Virol (1986) 57:267-274; Bett et 
al., J. Virol (1993) 67:5911-5921; MitterederetaL,//M/wa/2 Gene Therapy (1994) 

10 5:717-729; Seth et al, J, Virol (1994) 68:933-940; Bair et al., Gene Therapy (1994) 
1:51-58; Berkner, K.L. BioTechniques (1988) 6:616-629; and Rich et al.. Human Gene 
Therapy (1993) 4:461-476). 

Molecular conjugate vectors, such as the adenovirus chimeric vectors described 
in Michael et al., J. Biol Chem, (1993) 268:6866-6869 and Wagner et al., Proc. Natl 

15 Acad, Sou USA (1992) 89:6099-6103, can also be used for gene deUvery. 

Members of the Alphavirus genus, such as but not limited to vectors derived 
from the Sindbis and Semhki Forest viruses, VEE, will also find use as viral vectors for 
deUvering the gene of interest. For a description of Sindbis-virus derived vectors useful 
for the practice of the instant methods, see, Dubensky et al, J. Virol (1996) 70:508- 

20 519; and International Publication Nos. WO 95/07995 and WO 96/1 7072. 

Other vectors can be used, including but not limited to simian virus 40, 
cytomegalovirus. Bacterial vectors, such as Salmonella ssp. Yersinia enterocolitica. 
Shigella spp., Vibrio cholerae, Mycobacterium strain BCG, and Listeria 
monocytogenes can be used. Minichromosomes such as MC and MCI, bacteriophages, 

25 cosmids (plasmids mto which phage lambda cos sites have been inserted) and replicons 
(genetic elements that are capable of replication imder their own control in a cell) can 
also be used. 

The expression constructs may also be encapsulated, adsorbed to, or associated 
with, particulate carriers. Such carriers present multiple copies of a selected molecule 
30 to the immune system and promote trapping and retention of molecules in local lymph 
nodes. The particles can be phagocytosed by macrophages and can enhance antigen 
presentation through cytokine release. Examples of particulate carriers include those 
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derived from polymethyl methacrylate polymers, as well as microparticles derived from 
poly(lactides) and poly(lactide-co-glycolidesX known as PLG. See, e.g., Jeffery et al, 
Pharm, Res, (1993) 10:362-368; and McGee et al, J. Microencap. (1996). 

A wide variety of other methods can be used to deliver the expression 
5 constructs to cells. Such methods include DEAE dextran-mediated transfection, 

calcium phosphate precipitation, polylysine- or polyomithine-mediated transfection, or 
precipitation using other insoluble inorganic salts, such as strontium phosphate, 
aluminum silicates includmg bentonite and kaolin, chromic oxide, magnesium silicate, 
talc, and the like. Other useful methods of transfection include electroporation, 

10 sonoporation, protoplast fusion, liposomes, peptoid delivery, or microinjection. See, 
e.g., Sambrook et al., supra, for a discussion of techniques for transforming cells of 
interest; and Feigner, P.L., Advanced Drug Delivery Reviews (1990) 5:163-187, for a 
review of delivery systems useful for gene transfer. One particularly effective method 
of delivering DNA using electroporation is described in International Publication No. 

15 WO/0045823. 

Additionally, biolistic delivery systems employing particulate carriers such as 
gold and tungsten, are especially useful for delivering the expression constructs of the 
present invention. The particles are coated with the construct to be delivered and 
accelerated to high velocity, generally under a reduced atmosphere, using a gun powder 

20 discharge from a "gene gun." For a description of such techniques, and apparatuses 
useful therefore, see, e.g., U.S. Patent Nos. 4,945,050; 5,036,006; 5,100,792; 
5,179,022; 5,371,015; and 5,478,744. 

Compositions 

25 The invention also provides compositions comprising the HCV polypeptides or 

polynucleotides described herein. Such compositions are useful as diagnostics, for 
example, using the mutant polypeptides (or polynucleotides encoding these 
polypeptides) in diagnostic reagents. Diagnostics using polypeptides and 
polynucleotides are known to those of skill in the art. 

30 In addition, immunogenic compounds can be prepared from one or more 

immunogenic polypeptides derived from the polypeptides described herein, for 
example the ANS35 polypeptide. The preparation of immunogenic compounds which 
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contain immunogenic polypeptide(s) as active ingredients is known to one skilled in the 
art. Typically, such immunogenic compounds are prepared as injectables, either as 
liquid solutions or suspensions; solid forms suitable for solution in, or suspension in, 
liquid prior to injection can also be prepared. The preparation can also be emulsified, or 
5 the protein encapsulated in liposomes. 

Immunogenic and diagnostic compositions of the invention preferably comprise 
a pharmaceutically acceptable carrier. The carrier should not itself induce the 
production of antibodies harmful to the host. Pharmaceutically acceptable carriers are 
well known to those in the art. Such carriers include, but are not limited to, large, 

10 slowly metabolized, macromolecules, such as proteins, polysaccharides such as latex 
functionalized sepharose, agarose, cellulose, cellulose beads and the like, polylactic 
acids, polyglycolic acids, polymeric amino acids such as polyglutamic acid, polylysine, 
and the like, amino acid copolymers, and inactive virus particles. 

Pharmaceutically acceptable salts can also be used in compositions of the 

1 5 invention, for example, mineral salts such as hydrochlorides, hydrobromides, 

phosphates, or sulfates, as well as salts of organic acids such as acetates, proprionates, 
malonates, or benzoates. Especially useful protein substrates are serum albumins, 
keyhole limpet hemocyanin, immunoglobulin molecules, thyroglobulin, ovalbumin, 
tetanus toxoid, and other proteins well known to those of skill in the art. Compositions 

20 of the invention can also contain liquids or excipients, such as water, saline, glycerol, 
dextrose, ethanol, or the like, singly or in combination, as well as substances such as 
wetting agents, emulsifying agents, or pH buffering agents. Liposomes can also be 
used as a carrier for a composition of the invention, such liposomes are described 
above. 

25 If desired, co-stimulatory molecules which improve immunogen presentation to 

lymphocytes, such as B7-1 or B7-2, or cytokines such as GM-CSF, IL-2, and IL-12, 
can be included in a composition of the invention. Optionally, adjuvants can also be 
included in a composition. Adjuvants which can be used include, but are not limited to: 
(1) aluminum salts (alum), such as aluminum hydroxide, aluminum phosphate, 

30 aluminum sulfate, etc; (2) oil-in-water emulsion formulations (with or without other 
specific immunostimulating agents such as muramyl peptides (see below) or bacterial 
cell wall components), such as for example (a) MF59 (PCT Publ No. WO 90/14837), 
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containing 5% Squalene, 0.5% Tween 80, and 0.5% Span 85 (optionally containing 
various amounts of MTP-PE ), formulated into submicron particles using a 
microfluidizer such as Model 1 lOY microfluidizer (Microfluidics, Newton, MA), 
(b) SAF, containing 10% Squalane, 0.4% Tween 80, 5% pluronic-blocked polymer 
5 L121, and thr-MDP (see below) either microfluidized into a submicron emulsion or 
vortexed to generate a larger particle size emulsion, and (c) Ribi^*^ adjuvant system 
(RAS), (Ribi Immunochem, Hamilton, MT) containing 2% Squalene, 0,2% Tween 80, 
and one or more bacterial cell wall components from the group consisting of 
monophosphorylipid A (MPL), trehalose dimycolate (TDM), and cell wall skeleton 

10 (CWS), preferably MPL + CWS (Detox^^); (3) saponin adjuvants, such as Stimulon™ 
(Cambridge Bioscience, Worcester, MA) may be used or particles generated therefrom 
such as ISCOMs (immunostimulating complexes); (4) Complete Freund's Adjuvant 
(CFA) and Incomplete Freund's Adjuvant (IFA); (5) cytokines, such as mterieukins 
(e.g., IL-1, IL-2, IL-4, IL-5, IL-6, IL-7, IL-12, etc.), interferons (e.g., gamma 

1 5 interferon), macrophage colony stimulating factor (M-CSF), tumor necrosis factor 
(TNF), etc; (6) detoxified mutants of a bacterial ADP-ribosylating toxin such as a 
cholera toxin (CT), a pertussis toxin (FT), or an E. coli heat-labile toxin (LT), 
particularly LT-K63, LT.R72, CT-S109, PT-K9/GI29; see, e.g., WO 93/13302 and 
WO 92/19265; (7) other substances that act as immunostimulating agents to enhance 

20 the effectiveness of the composition; and (8) microparticles with adsorbed 

macromolecules, as described in copending U.S. Patent Application Serial No. 
09/285,855 (filed April 2, 1999) and international Patent Application Serial No. 
PCT/US99/17308 (filed July 29, 1999). Alum and MF59 are preferred. The 
effectiveness of an adjuvant can be determined by measuring the amount of antibodies 

25 directed against an immunogenic polypeptide containing an HCV antigenic sequence 
resulting from administration of this polypeptide in inununogenic compounds which 
are also comprised of the various adjuvants. 

As mentioned above, muramyl peptides include, but are not limited to, N- 
acetyl-muramyl-L-threonyl-D-isoglutamine (thr-MDP), -acetyl-normuramyl-L-alanyl- 

30 D-isoglutamine (CGP 1 1 637, referred to nor-MDP), N-acetybnuramyl-L-alanyl-D- 

isoglutaminyl-L-alanine-2-(r-2*-dipabnitoyl-5«-glycero-3-hydroxyphosphoryloxy)- 
ethylamine (CGP 19835 A, referred to as MTP-PE), eta 
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Thus, such recombinant or synthetic HCV polypeptides can be used in vaccines 
and as diagnostics. Further, antibodies raised against these polypeptides can also be 
used as diagnostics, or for passive immunotherapy. In addition, antibodies to these 
polypeptides are useful for isolating and identifying HCV particles. 
5 Native HCV antigens can also be isolated from HCV virions. The virions can be 

grown in HCV infected cells in tissue culture, or in an mfected host. 

Administration and Delivery 

The polynucleotide and polypeptide compositions described herein (e.g„ 
10 immunogenic compounds) may be administered to a subject using any suitable delivery 
means. Methods of delivering nucleic acids into host cells are discussed above. 
Further, HCV polynucleotides and/or polypeptides can be administered parenterally, by 
injection, usually, subcutaneously, intramuscularly, transdermally or transcutaneously. 
Certain adjuvants, e.g. LTK63, LTR72 or PLC formulations, can be administered 
1 5 intranasally or orally. Additional formulations which are suitable for other modes of 
administration include suppositories. For suppositories, traditional binders and carriers 
can include, for example, polyalkylene glycols or triglycerides; such suppositories can 
be formed from mixtures containing the active ingredient in the range of 0.5% to 10%, 
preferably l%-2%. Other oral formulations include such normally employed excipients 
20 as, for example, pharmaceutical grades of mannitol, lactose, starch, magnesium 
stearate, sodium saccharine, cellulose, magnesium carbonate, and the like. These 
compositions take the form of solutions, suspensions, tablets, pills, capsules, sustained 
release formulations or powders and contain 10%-95% of active ingredient, preferably 
25%-70%. 

25 The polypeptides of the present invention can be formulated into the 

immunogenic compound as neutral or salt forms. Pharmaceutically acceptable salts 
include the acid addition salts (formed with free amino groups of the peptide) and 
which are formed with inorganic acids such as, for example, hydrochloric or 
phosphoric acids, or such organic acids such as acetic, oxaUc, tartaric, maleic, and the 

30 like. Salts formed with the free carboxyl groups can also be derived from inorganic 
bases such as, for example, sodiiun, potassium, ammonium, calcium, or ferric 
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hydroxides, and such organic bases as isopropylamine, trimethylamine, 2-ethylamino 
ethanol, histidine, procaine, and the like. 

The immunogenic compoimds are administered in a manner compatible with the 
dosage formulation, and in such amount as will be prophylactically and/or 
5 therapeutically effective. The quantity to be administered, which is generally in the 
range of 5 micrograms to 250 micrograms of polypeptide per dose, depends on the 
subject to be treated, capacity of the subject's immune system to synthesize antibodies, 
and the degree of protection desired. Precise amounts of active ingredient required to be 
administered may depend on the judgment of the practitioner and can be peculiar to 

10 each subject. 

The immunogenic compound can be given in a single dose schedule, or 
preferably in a multiple dose schedule. A multiple dose schedule is one in which a 
primary course of vaccination can be with 1-10 separate doses, followed by other doses 
given at subsequent time intervals required to maintain and or reenforce the immune 

1 5 response, for example, at 1 -4 months for a second dose, and if needed, a subsequent 
dose(s) after several months. Further, the course of administration may include 
polynucleotides and polypeptides, together or sequentially (for example, priming with a 
polynucleotide composition and boosting with a polypeptide composition). The dosage 
regimen will also, at least in part, be determined by the need of the individual and be 

20 dependent upon the judgment of the practitioner. 

In certain embodiments, administration of the polynucleotides and polypeptides 
described herein is used to activate T cells. In addition to the practical advantages of 
simplicity of construction and modification, administration of polynucleotides encoding 
mutant NS polypeptides results in the synthesis of a mutant NS polypeptide in the host. 

25 Thus, these immunogens are presented to the host immune system with native post- 
translational modifications, structure, and conformation. The polynucleotides are 
preferably injected intramuscularly to a large mammal, such as a human, at a dose of 
0.5, 0.75, 1.0, 1.5, 2.0, 2.5, 5 or 10 mg/kg. 

The proteins and/or polynucleotides can be administered either to a mammal 

30 which is not infected with an HCV or can be administered to an HCV-infected 
mammal. The particular dosages of the polynucleotides or fiision proteins in a 
composition or will depend on many factors including, but not limited to the species. 
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age, and general condition of the mammal to which the composition is administered, 
and the mode of administration of the composition. An effective amount of the 
composition of the invention can be readily determined using only routine 
experimentation. In vitro and in vivo models can be employed to identify appropriate 
5 doses. Generally, 0.5, 0.75, 1.0, 1.5, 2.0, 2.5, 5 or 10 mg will be administered to a large 
mammal, such as a baboon, chimpanzee, or human. If desired, co-stimulatory 
molecules or adjuvants can also be provided before, after, or together with the 
compositions. 

10 Antibodies and Diagnostics 

Antibodies, both monoclonal and polyclonal, which are directed against HCV 
epitopes are particularly useful in diagnosis, and those which are neutralizing are useful 
in passive immimotherapy. Monoclonal antibodies, in particular, may be used to raise 
anti-idiotype antibodies. 

1 5 Anti-idiotype antibodies are immunoglobulins which carry an "internal image" 

of the antigen of the infectious agent against which protection is desired. Techniques 
for raising anti-idiotype antibodies are known in the art. See, e.g., Grzych (1985), 
Nature 316:74; MacNamara et al. (1984), Science 226:1325, Uytdehaag et al (1985), J. 
Immunol. 134: 1225. These anti-idiotype antibodies may also be useful for treatment 

20 and/or diagnosis of NANBH, as well as for an elucidation of the immunogenic regions 
of HCV antigens. 

An immunoassay for viral antigen may use, for example, a monoclonal antibody 
directed towards a viral epitope, a combination of monoclonal antibodies directed*^* 
towards epitopes of one viral polypeptide, monoclonal antibodies directed towards 

25 epitopes of different viral polypeptides, polyclonal antibodies directed towards the 

same viral antigen, polyclonal antibodies directed towards different viral antigens or a 
combination of monoclonal and polyclonal antibodies. 

Immunoassay protocols may be based, for example, upon competition, or direct 
reaction, or sandwich type assays. Protocols may also, for example, use solid supports, 

30 or may be by immunoprecipitation. Most assays involve the use of labeled antibody or 
polypeptide. The labels may be, for example, fluorescent, chemiluminescent, 
radioactive, or dye molecules. Assays which ampUfy the signals from the probe are also 
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known. Examples of which are assays which utilize biotin and avidin, and enzyme- 
labeled and mediated immimoassays, such as ELISA assays. 

An enzyme-linked immunosorbent assay (ELISA) can be used to measure either 
antigen or antibody concentrations. This method depends upon conjugation of an 
5 enzyme to either an antigen or an antibody, and uses the bound enzyme activity as a 
quantitative label. To measure antibody, the known antigen is fixed to a soUd phase 
(e.g., a microplate or plastic cup), incubated with test serum dilutions, washed, 
incubated with anti-immunoglobulin labeled with an enzyme, and washed again. 
Enzymes suitable for labeling are known in the art, and include, for example, 
10 horseradish peroxidase. Enzyme activity boimd to the solid phase is measured by 

adding the specific substrate, and determining product formation or substrate utilization 
colorimetrically. The enzyme activity bound is a direct function of the amount of 
antibody bound. 

To measure antigen, a known specific antibody is fixed to the solid phase, the 
15 test material containing antigen is added, after an incubation the solid phase is washed, 
and a second enzyme-labeled antibody is added. After washing, substrate is added, and 
enzyme activity is estimated colorimetrically, and related to antigen concentration. 

The HCV fusion proteins, such as NS3 mutant and core fusion proteins, can 
also be used to produce HCV-specific polyclonal and monoclonal antibodies. HCV- 
20 specific polyclonal and monoclonal antibodies specifically bind to HCV antigens. 

Polyclonal antibodies can be produced by administering the fiision protein to a 
mammal, such as a mouse, a rabbit, a goat, or a horse. Serum firom the immunized 
animal is collected and the antibodies are purified from the plasma by, for example, 
precipitation with anunonium sulfate, followed by chromatography, preferably afiBnity 
25 chromatography. Techniques for producing and processing polyclonal antisera are 
known in the art. 

Monoclonal antibodies directed against HCV-specific epitopes present in the 
fusion proteins can also be readily produced. Normal B cells bom a mammal, such as a 
mouse, immunized with, e.g., a mutant NS3 polypeptide or NS-core fiision protein can 
30 be fiised with, for example, HAT-sensitive mouse myeloma cells to produce 

hybridomas. Hybridomas producing HCV-specific antibodies can be identified using 
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RIA or ELIS A and isolated by cloning in semi-solid agar or by limiting dilution. 
Clones producing HCV-specific antibodies are isolated by another round of screening. 

Antibodies, either monoclonal and polyclonal, which are directed against HCV 
epitopes, are particularly useful for detecting the presence of HCV or HCV antigens in 
5 a sample, such as a serum sample from an HCV-infected human. An immunoassay for 
an HCV antigen may utilize one antibody or several antibodies. An immunoassay for 
an HCV antigen may use, for example, a monoclonal antibody directed towards an 
HCV epitope, a combination of monoclonal antibodies directed towards epitopes of one 
HCV polypeptide, monoclonal antibodies directed towards epitopes of different HCV 

10 polypeptides, polyclonal antibodies directed towards the same HCV antigen, polyclonal 
antibodies directed towards different HCV antigens, or a combination of monoclonal 
and polyclonal antibodies. Immunoassay protocols may be based, for example, upon 
competition, direct reaction, or sandwich type assays using, for example, labeled 
antibody. The labels may be, for example, fluorescent, chemiluminescent, or 

15 radioactive. 

The polyclonal or monoclonal antibodies may fiuther be used to isolate HCV 
particles or antigens by immunoafifmity columns. The antibodies can be affixed to a 
soUd support by, for example, adsorption or by covalent linkage so that the antibodies 
retain their immunoselective activity. Optionally, spacer groups may be included so 
20 that the antigen binding site of the antibody remains accessible. The immobilized 

antibodies can then be used to bind HCV particles or antigens from a biological sample, 
such as blood or plasma. The bound HCV particles or antigens are recovered from the 
column matrix by, for example, a change in pH. 

25 Methods of Eliciting Immune Responses 

HCV-specific T cells that are activated by the above-described polypeptides, 
expressed in vivo or in vitro preferably recognize an epitope of an HCV polypeptide 
such as a mutant NS3 polypeptide, including an epitope of a mutant HCV polypeptide. 
HCV-specific T cells can be CD8^ or CD4^ 
30 HCV-specific CD8^ T cells preferably are cytotoxic T lymphocytes (CTL) 

which can kill HCV-infected cells that display NS3, NS4, NS5a, NS5b epitopes 
complexed with an MHC class I molecule. HCV-specific CDS"^ T cells may also 
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express interferon-Y (IFN-y). HCV-specific CD8^ T cells can be detected by, for 
example, ^*Cr release assays. ^'Cr release assays measure the ability of HGV-specific 
CD8* T cells to lyse target cells displaying an nonstructural (e.g., mutant NS) epitope. 
HCV"Specific CDS"^ T cells which express IFN-y can also be detected by 
5 immunological methods, preferably by intracellular staining for IFN-y after in vitro 
stimulation with a mutant NS polypeptide. 

HCV-specific CD4^ cells activated by the above-described polypeptides, 
expressed in vivo or in vitro, and combinations of the individual components of these 
proteins, preferably recognize an epitope of a mutant non-structural polypeptide, 
1 0 including an epitope of a mutant protein, that is bound to an MHC class EE molecule on 
an HCV-infected cell and proliferate in response to stimulating mutant peptides. 

HCV-specific CD4^ T cells can be detected by a lymphoproliferation assay. 
Lymphoproliferation assays measure the ability of HCV-specific CD4'^ T cells to 
proliferate in response to an epitope. 
1 5 Mutant NS (or fiisions thereof with core, envelope or other viral polypeptides) 

can be used to activate HCV-specific T cells either in vitro or in vivo. Activation of 
HCV-specific T cells can be used, inter alia, to provide model systems to optimize 
CTL responses to HCV and to provide prophylactic or therapeutic treatment against 
HCV infection. For in vitro activation, proteins are preferably supplied to T cells via a 
20 plasmid or a viral vector, such as an adenovirus vector, as described above. 

Polyclonal populations of T cells can be derived firom the blood, and preferably 
firom peripheral lymphoid organs, such as lymph nodes, spleen, or thymus, of mammals 
that have been infected witkan HCV. Preferred mammals include mice, chimpanzees, 
baboons, and humans. The HCV serves to expand the number of activated HCV- 
25 specific T cells in the mammal. The HCV-specific T cells derived fi-om the mammal 
can then be restimulated in vitro by adding HCV epitopic peptides to the T cells. The 
HCV-specific T cells can then be tested for, inter alia, proliferation (eg,, 
lymphoproliferation assays known in the art), the production of IFN-y, and the ability 
to lyse target cells displaying HCV NS epitopes in vitro. 

30 
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The following examples are meant to illustrate the invention and are not meant 
to limit it in any way. Those of ordinary skill in the art will recognize modifications 
within the spirit and scope of the invention as set forth herein, 

5 EXAMPLES 
Example 1: Constructs 

pCMV-n: pCMV-II (Figure 7, SEQ ID NO;5) was created to contain the human 
CMV promoter, enhancer, intron A, polylinker and the bovine growth hormone 

10 terminator in a deleted-pUC backbone (Life Technologies). 

pT7-HCV: pT7"HCV was created in a polyhnker-modified pUC vector to 
contain full-length HCV cDNA preceded by a synthetic T7 promoter. pT7-HCV also 
contains the complete 5' UTR and the poly A version of the 3* UTR. 

PCMV.ANS35 : To generate pCMV.ANS35 (Figure 5, SEQ ID NO:3), a two 

1 5 step procedure was undertaken. First, a PGR product was generated from pT7-HCV 
that corresponded to the following: a 5' EcoRI site, followed by the Kozak sequence of 
ACCATGG; the initiator ATG followed by amino acid #1242 and continuing to the 
StuI site. Second, the StuI to Xbal fragment from a full-length genomic clone was 
isolated. The genomic clone consisted of the T7 promoter fused to the full-length HCV 

20 cDNA with the poly A version of the 3* end, in a pUC vector. Finally, the EcoRI-StuI 
and Stul-Xbal fragments were ligated into the pCMV-II expression vector, transformed 
into HBlOl competent cells and plated onto ampicillin (100 jig/ml). Miniprep analyses 
led to the identification of the desired clone which was amplified on a larger scale using 
a Quigen Gigaprep kit following the manufacturer's specifications. The resulting clone 

25 was named pCMV.ANS35 (Figure 5, SEQ ID NO:3). 

pd.ANS3NS5 : As shown schematically in Figure 10, the yeast expression 
plasmid pd. ANS3NS5 (SEQ ID N0:8) was constructed using restriction fragments 
obtained from ttie mammalian expression plasmid pCMV.KM.ANS35. 
pCMV.KM.ANS35 is identical to pCMV.ANS35 (Figure 5, SEQ ID N0:3) except that 

30 it contains a kanamycin resistance gene in the viral backbone. pCMV.KM.ANS35 was 
digested with EcoRI and Nhel to obtain 2895bp EcoRI-Nhel fragment. EcoRI-Nhel 
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fragment was ligated into pRSET Hindm-Nhel subcloning vector with oligos (HE) 
from Hindlll to EcoRI. After sequence verification, pRSETHindlll-Nhel #6 was 
digested with Hindin and Nhel to obtain a 2908bp Hindlll-Nhel fragment. 

pCMV.KM.ANS35 was linearized with Xbal and ligated with synthetic oligos 
5 (XS) from Xbal-Sall. The ligation was digested with Nhel and Sail to obtain 248 Ibp 
Nhel-Sall fragment. The fragment was ligated into pET3aNheI-SalI subcloning 
vector. After sequence verification, pET3a Nhel-Sall #2 was digested with Nhel and 
Sail to obtain a 248 Ibp Nhel-Sall fragment. BamHI-Hindlll ADH2/GAPDH promoter 
fragment was then ligated with Hindm-Nhel and Nhel-Sall fragments into pBS24.1 

10 BanfiHI-Sall yeast expression vector, 

pd.ANS3NS5.PJ: pd.ANS3NS5.PJ (Figures 13 and 14; SEQ ID NO:10) was 
generated to create a "perfect junction" at the 5' and 3' end of the HCV coding region. 
At the 5' end of pd.ANS3NS5, there were 6 extra bases between the yeast 
ADH2/GAPDH promoter and the ATG of the polypeptide. At the 3' end, there were 52 

1 5 bases of untranslated sequence between the stop codon of the polypeptide and the a- 
factor terminator in the yeast expression vector. pd.ANS3NS5.PJ was created by 
digesting pd.ANS3NS5 #17 with Seal and SphI to obtain 4963bp Scal-SphI fragment. 
pd.NS5b3011 was digested with SphI and Sail to obtain a 321bp Sphl-Sall fragment 
which gave the "perfect junction** at the 3' end of the polypeptide. The Scal-SphI and 

20 Sphl-Sall fragments were ligated into pSP72 Hindlll-Sall subcloning vector with 
synthetic ohgos from Hindin-ScaI(HS) for the "perfect junction" at the 5' end. 

The region of synthetic sequence in pSP72 Hindm-Sall clone# 6 was verified. 
pSP72 Hindin-Sall clone#6 was digested with Hindin aiid Bhil or with Blnl and Sail 
to obtain 2441bp Hindm-Bbil and 2895bp Bhil-Sall fragments, respectively. The 

25 BamHI-Hindm ADH2/GAPDH promoter fragment was ligated to Hindlll-Blnl and 
Blnl-Sall fragments into pBS24.1 BamHI-Sall yeast expression vector. 

pd.ANS3NS5.PJ.corel21RT and nd.A NS3NS5.PJ.corel73RT were generated 
and encode HCV core aa 1-121 at the C-terminus of the ANS3NS5 polypeptide 
(designated pd.ANS3NS5.PJ.corel21RT, SEQ ID N0:12) and core aa 1-173 at the C- 

30 terminus of the ANS3NS5 polypeptide (designated pd.ANS3NS5.PJ.corel73RT, SEQ 
ID NO; 14). The core sequence had aa 9 mutated from Lys to Arg and aa 1 1 mutated 
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from Asn to Thr, designated as core 121RT or 173RT. 

Dd.ANS3NS5.PJ.corel21RT and pd.ANS3NS5,PJ.corel73RT : To generate 
pd.ANS3NS5.PJxorel21RT (Figure 17, SEQ ID N0:12) and 
pd,ANS3NS5.PJ.corel73RT (Figure 18, SEQ ID N0:14). As shown in Figure 16, a 
5 Notl-Sal HCVcorel21RT and HCVcorel73RT were amplified by PGR, from an E. coli 
expression plasmid, pSODCF2.HCVcorel91RT #2. Either the core 121RT Not-Sall 
PGR product or the core 173RT Not-SaU PGR product were ligated into a pT7Blue2 
Pstl-Sall subcloning vector with synthetic oligos (PN) from PstI to Notl. After 
sequence confirmation, pT7Blue2corel21RT clone#9 and pT7Blue2corel73RT 

10 clone#l 1 was digested with PstI and Sail to obtain 403bp and 559bp Pstl-Sall 
fragments, respectively, for further cloning. 

A 121bp Notl-PstI fragment from pSP72 HindHI-Sall clone #6 was isolated as 
described above during the cloning of pd.ANS3NS5.PJ. Notl-PstI and Pstl-Sall 
fragments were assembled into a vector made by digesting pd.NS3NS5.PJ clone#5 

1 5 (described above) with Notl and Sail. 

ANS3NS5 and Gore 140 and Core 150 : An HCV core epitope was found which 
elicits GTLs in baboons (HCV core aa 121-135). Since pd.ANS3NS5.PJ.corel21RT 
ends right before this potentially important epitope and was expressed better than the 
longer pd.ANS3NS5.PJ,corel73RT construct (Example 2), two intermediate constructs 

20 were made which include this epitope, possibly giving intermediate expression levels. 
The two new constructs fused HCV core aa 1-140 or HCV core aal-150 to the C 
terminus of ANS3NS5.PJ. 

pd.ANS3NS5.PJ.corel40RT (Figure 21. SEP ID NO:16^ and 
pd.ANS3NS5,PJ.corel50RT (Figure 22, SEQ ED NO: 18): As shown in Figure 20, a 

25 Pstl-Sall HCVcoreMORT and a PstI-SalIHGVcorel50RT fragment were amplified by 
PGR from pd.ANS3NS5.PJ.corel73RT clone #16. Ligate either HCV core Psd-Sall 
PGR products into pT7Blue2 Pstt-Sall subcloning vector. After sequence 
confirmation, pT7Blue2corel40RT clone#22 and pT7Blue2corel50RT clone#26 were 
digested with Pstl-Sall to obtain 460bp and 490bp Pstl-Sall fragments, respectively, for 

30 further cloning. 
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A 121bp Nofl-Pstf fragment was isolated from pSP72 Hindlll-Sall clone #6 (as 
described above during the cloning of pd.ANS3NS5.PJ. Notl-PstI and Pstl-Sall 
fragments were assembled into a vector made by digesting pd.ANS3NS5.PJ clone#5 
(described above) with NotI and Sail. 

5 

Example 2: Protein Expression 

Various of the constructs described herein, encoding HCV-1 ANS3 to NS5 
antigen (aa 1242-301 1), were expressed in yeast. S. cerevisiae strain AD3 was 
transformed with pd.ANS3NS5 and checked for expression. A stained protein band at 

10 the expected molecular weight of 1 94 kD was not observed (Figure 1 2). Strain AD3 
was also transformed with pd.ANS3NS5.PJ clone #5 and checked for expression. A 
protein band of the expected molecular weight of 194kD was detected (Figure 15). 

Strain AD3 was transformed with pd.ANS3NS5.PJ.corel21RT clone #6 and 
pd.ANS3NS5.PJ,corel73RT clone#15 and checked for expression. Protein bands of the 

15 expected molecular weight of 206kD and 210kD, respectively, were observed. 

Expression levels of the pd.ANS3NS5.PJ.corel73RT construct were much less than 
that of the pd.ANS3NS5,PJ.corel21RT construct. (See Figurel9). Thus, there is a 
correlation of protein expression levels and the length of HCV core. 

Strain AD3 were transformed with pd.ANS3NS5.PJ.corel40RT clone# 29 and 

20 pd.ANS3NS5.PJ.corel50RT clone#35 and checked for expression. Bands of the 

expected molecular weights of 208kD and 209kD were seen by stain at levels close to 
those of pd.ANS3NS5corel 73RT (Figure 23). 

Example 3: Eliciting Immune Responses 

25 A. Immunization 

To evaluate the immimogenicity of the mutant NS polypeptides, studies using 
guinea pigs, rabbits, mice, rhesus macaques and/or baboons are performed. The studies 
are structured as follows: DNA immunization alone (single or multiple); DNA 
immunization followed by protein immimization (boost); DNA immunization followed 

30 by protein immunization; immunization by PLG particles. Immunization is 
intramuscular or mucosally. 
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B. Humoral Immune Response 

The humoral inmiune response is checked in serum specimens from immunized 
animals with anti-NS antibody ELISAs (enzyme-linked inununosorbent assays) at 
5 various times post-immunization. Briefly, serum from immunized animals is screened 
for antibodies directed against the NS or mutant NS proteins. Wells of ELISA 
microtiter plates are coated overnight with the selected HCV protein and washed four 
times; subsequently, blocking is done with PBS-0.2% Tween (Sigma). After removal 
of the blocking solution, diluted mouse serum is added. Sera are tested at various 

10 dilutions. Microtiter plates are washed and incubated with a secondary, peroxidase- 

coupled anti-mouse IgG antibody (Pierce, Rockford, BL). ELISA plates are washed and 
3, 3', 5, 5'-tetramethyl benzidine (TMB; Pierce) is added per well. The optical density 
of each well is measured. Titers are typically reported as the reciprocal of the dilution 
of serum that gave a half-maximum optical density (O.D.). Similarly, generation of 

1 5 neutralization of binding (NOB) antibodies can be measured by methods known in the 
art. 

C. Cellular Immune Response 

The frequency of specific cytotoxic T-lymphocytes (CTL) is evaluated by a 
20 standard chromium release assay of peptide pulsed Balb/c mouse CD4 cells. Briefly, 
spleen cells (Effector cells, E) are obtained from the BALB/c mice immunized, 
cultured, restimulated, and assayed for CTL activity against HCV peptide-pulsed target 
cells. Cytotoxic activity is measured in a standard ^*Cr release assay. 

25 Example 4: Immunization with PLG-delivered DNA. 

The polylactide-co-glycoUde (PLG) polymers are obtained from Boehringer 
Ingelheim, U.S.A. The PLG polymer is RG505, which has a copolymer ratio of 50/50 
and a molecular weight of 65 kDa (manufacturers data). Cationic microparticles with 
adsorbed DNA are prepared using a modified solvent evaporation process, essentially 
30 as described in Singh et al., Proc. Natl Acad. ScL USA (2000) 97:81 1-816. Briefly, the 
microparticles are prepared by emulsifying a 5% w/v polymer solution in methylene 
chloride with PBS at high speed using an IKA homogenizer. The primary emulsion is 
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then added to distilled water containing cetyl trimethyl ammonium bromide (CTAB) 
(0.5% w/v). This results in the formation of a w/o/w emulsion which was stirred at 
room temperature, allowing the methylene chloride to evaporate. The resulting 
microparticles are washed in distilled water by centrifiigation and freeze dried. 
Following preparation, washing and collection, DNA is adsorbed onto the 
microparticles by incubating cationic microparticles in a solution of DNA. The 
microparticles are then separated by centrifiigation, the pellet washed with TE buffer 
and the microparticles are freeze dried, resuspended and administered to animals. 
Antibody titers are measured by ELISA assays. 
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What is claimed is: 

1 . An isolated mutant non-structural ("NS") HCV polypeptide comprising 
a polypeptide having a mutation in the catalytic domain of NS3, wherein said mutation 

5 functionally disrupts the catalytic domain. 

2. The polypeptide of claim 1, wherein the mutation comprises a deletion. 

3. The polypeptide of claim 1 . wherein the mutation comprises a 
10 substitution. 

4. The polypeptide of any of claims 1-3, wherein said NS polypeptide 
comprises NS3, NS4 and NS5. 

15 5. The polypeptide of any of claims 1-3, wherein said NS polypeptide 

consists of NS3, NS4 and NS5. 

6. The polypeptide of any of claims 1-3, wherein said NS polypeptide 
consists of NS3 and NS5. 

20 

7. The polypeptide of claim 6, wherein NS5 consists of NS5a. 

8. The polypeptide of claim 6, wherein NS5 consists of NS5b. 

25 9. The polypeptide of any of claims 1-3, wherein said NS polypeptide 

consists of NS3 and NS4. 

10. The polypeptide of claim 9, wherein NS4 consists of NS4a. 

30 11. The polypeptide of claim 9, wherein NS4 consists of NS4b. 
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12. The polypeptide of claim 4, further comprising a second viral 
polypeptide that is not NS3, NS4, or NS5 of HCV. 

1 3. The polypeptide of claim 12, wherein the second viral polypeptide 
5 comprises an HCV Core polypeptide C'C"), or fragment thereof 

14. The polypeptide of claim 13, wherein the C polypeptide is truncated. 

15. The polypeptide of claim 14, wherein the truncation is at amino acid 
10 121. 

16. The polypeptide of claim 12, wherein the polypeptide further comprises 
an HCV envelope protem ("E"). 

15 17. The polypeptide of claim 1 6, wherein the E is E 1 . 

18. The polypeptide of claim 16, wherein the E is E2. 

19. A composition comprising 

20 (a) the polypeptide of any one of claims 1-18; and 

(b) a pharmaceutically acceptable excipient. 

20. An isolated and purified polynucleotide which encodes the mutant HCV 
polypeptide according to any one of claims 1-18. 

25 

21. A composition comprising 

(a) the isolated purified polynucleotide of claim 20; and 

(b) a pharmaceutically acceptable excipient. 

30 22. The composition of claim 2 1 , wherein the polynucleotide is DNA. 
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23 . The composition of claim 2 1 , wherein the polynucleotide is in a 
plasmid. 

24. An expression vector comprising the polynucleotide of claim 20. 

5 

25. An expression vector comprising the polynucleotide of SEQ ID N0:8. 

26. A host cell comprising the polynucleotide of claim 20. 
10 27. The host cell of claim 26, wherein the cell is a yeast cell. 

28. The host cell of claim 26, wherein the cell is a mammalian cell. 

29. The host cell of claim 26, wherein the cell is an insect cell 

15 

30. The host cell of claim 26, wherein the cell is a plant cell. 

3 1 . The host cell of claim 26, wherein the polynucleotide comprises the 
sequence of SEQ ID N0:8. 

20 

32. The polypeptide of claim 1, wherein the polypeptide further comprises 
SEQIDN0:9. 

33. A method of preparing a mutant NS HCV polypeptide, wherein the 
25 method comprises the steps of: 

a. transforming a host cell with an expression vector according to 
claim 24, under conditions wherein the polypeptide is expressed; 
and 



30 



b. isolating the polypeptide. 
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34. The method of claim 33, wherein the host cell is a yeast cell. 

35. The method of claim 33, wherein the host cell is a mammalian cell. 
5 36. The method of claim 33, wherein the host cell is an insect cell. 

37. The method of claim 33, wherein the host cell is a plant ceil. 

38. An antibody that specifically binds to a polypeptide of any of claims 1- 

10 18. 

39. The antibody of claim 38, wherein the antibody is a monoclonal 
antibody. 

1 5 40. The antibody of claim 38, wherein the antibody is a purified polyclonal 

antibody. 

41 . A method of ehciting an immime response in a subject, comprising the 
step of administering to the subject a polypeptide of any of claims 1-18. 

20 

42. A method of eliciting an inmiune response in a subject, comprising the 
step of administering to the subject a polynucleotide of claim 20. 
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1 


TCGCGCGTTT CGGTGATGAC GGTGAAAACC TCTGACACAT GCAGCTCCCG GAGACGGTCA CAGCTTGTCT GTAAGCGGAT 
AGCGCGCAAA GCCACTACTG CCACTTTTGG AGACTGTGTA CGTCGAGGGC CTCTGCCAGT GTCGAACAGA CATTCGCCTA 


81 


GCCGGGAGbA GACAAGCCCG TCAGGGCGCG TCAGCGGGTG TTGGCGGGTG TCGGGGCTGG CTTAACTATG CGGCATCAGA 
CGGCCCTCGT CTGTTCGGGC AGTCCCGCGC AGTCGCCCAC AACCGCCCAC AGCCCCGACC GAATTGATAC GCCGTAGTCT 


161 


StuI 

GCAGATTGTA CTGAGAGTGC ACCATATGAA GCTTTTTGCA AAAGCCTAGG CCTCCAAAAA AGCCTCCTCA CTACTTCTGG 
CGTCTAACAT GACTCTCACG TGGTATACTT CGAAAAACGT TTTCGGATCC GGAGGTTTTT TCGGAGGAGT GATGAAGACC 


241 


AATAGCrCAG AGGCCGAGGC GGCCTCGGCC TCTGCATAAA TAAAAAAAAT TAGTCAGCCA TGGGGCGGAG AATGGGCGGA 
TTATCGAGTC TCCGGCTCCG CCGGAGCCGG AGACGTATTT ATTTTTTTTA ATCAGTCGGT ACCCCGCCTC TTACCCGCCT 


321 


ACTGGGCGGG GAGGGAATTA TTGGCTATTG GCCATTGCAT ACGTTGTATC TATATCATAA TATGTACATT TATATTGGCT 
TGACCCGCCC CTCCCTTAAT AACCGATAAC CGGTAACGTA TGCAACATAG ATATAGTATT ATACATGTAA ATATAACCGA 


401 


CATGTCCAAT ATGACCGCCA TGTTGACATT GATTATTGAC TAGTTATTAA TAGTAATCAA TTACGGGGTC ATTAGTTCAT 
GTACAGGTTA TACTGGCGGT ACAACTGTAA CTAATAACTG ATCAATAATT ATCATTAGTT AATGCCCCAG TAATCAAGTA 


481 


AGCCCATATA TGGAGTTCCG CGTTACATAA CTTACGGTAA ATGGCCCGCC TGGCTGACCG CCCAACGACC CCCGCCCATT 
TCGGGTATAT ACCTCAAGGC GCAATGTATT GAATGCCATT TACCGGGCGG ACCGACTGGC GGGTTGCTGG GGGCGGGTAA 


561 


GACGXCAATA ATGACGTATG TTCCCATAGT AACGCCAATA GGGACTTTCC ATTGACGTCA ATGGGTGGAG TATTTACGGT 
CTGCAGTTAT TACTGCATAC AAGGGTATCA TTGCGGTTAT CCCTGAAAGG TAACTGCAGT TACCCACCTC ATAAATGCCA 


641 


AAACTGCCCA CTTGGCAGTA CATCAAGTGT ATCATATGCC AAGTCCGCCC CCTATTGACG TCAATGACGG TAAATGGCCC 
TTTGACGGGT GAACCGTCAT GTAGTTCACA TAGTATACGG TTCAGGCGGG GGATAACTGC AGTTACTGCC ATTTACCGGG 


721 


GCCTGGCATT ATGCCCAGTA CATGACCTTA CGGGACTTTC CTACTTGGCA GTACATCTAC GTATTAGTCA TCGCTATTAC 
CGGACCGTAA TACGGGTCAT GTACTGGAAT GCCCTGAAAG GATGAACCGT CATGTAGATG CATAATCAGT AGCGATAATG 


801 


CATGGTGATG CGGTTTTGGC AGTACACCAA TGGGCGTGGA TAGCGGTTTG ACTCACGGGG ATTTCCAAGT CTCCACCCCA 
GTACCACTAC GCCAAAACCG TCATGTGGTT ACCCGCACCT ATCGCCAAAC TGAGTGCCCC TAAAGGTTCA GAGGTGGGGT 


881 


TTGACGTCAA TGGGAGTTTG TTTTGGCACC AAAATCAACG GGACTTTCCA AAATGTCGTA ATAACCCCGC CCCGTTGACG 
AACTGCAGTT ACCCTCAAAC AAAACCGTGG TTTTAGTTGC CCTGAAAGGt TTTACAGCAT TATTGGGGCG GGGCAACTGC 


961 


CAAATGGGCG GTAGGCGTGT ACGGTGGGAG GTCTATATAA GCAGAGCTCG TTTACTGAAC CGTCAGATCG CCTGGAGACG 
GTTTACCCGC CATCCGCACA TGCCACCCTC CAGATATATT CGTCTCGAGC AAATCACTTG GCAGTCTAGC GGACCTCTGC 


1041 


CCATCCACGC TGTTTTGACC TCCATAGAAG ACACt.\3Vjv)<Av* ^vjAI^wV^w*- iv-^-vj^-vj^j^-^-o vjvs/vikWvjvj * v>v- r*j. *\j«fwi-v4\^ 
GGTAGGTGCG ACAAAACTGG AGGTATCTTC TGTGGCCCTG GCTAGGTCGG AGGCGCCGGC CCTTGCCACG TAACCTTGCG 


1121 


GGATTCCCCG TGCCAAGAGT GACGTAAGTA CCGCCTATAG ACTCTATAGG CACACCCCTT TGGCTCTTAT GCATGCTATA 
CCTAAGGGGC ACGGTTCTCA CTGCATTCAT GGCGGATATC TGAGATATCC GTGTGGGGAA ACCGAGAATA CGTACGATAT 


1201 


CTGTTTTTGG CTTGGGGCCT ATACACCCCC GCTCCTTATG CTATAGGTGA TGGTATAGCT TAGCCTATAG GTGTGGGTTA 
GACAAAAACC GAACCCCGGA TATGTGGGGG CGAGGAATAC GATATCCACT ACCATATCGA ATCGGATATC CACACCCAAT 


1281 


TTGACCATTA TTGACCACTC CCCTATTGGT GACGATACTT TCCATTACTA ATCCATAACA TGGCTCTTTG CCACAACTAT 
AACTGGTAAT AACTGGTGAG GGGATAACCA CTGCTATGAA AGGTAATGAT TAGGTATTGI ACCGAGAAAC GGTGTTGATA 


1361 


CTCTATTGGC TATATGCCAA TACTCTGTCC TTCAGAGACT GACACGGACT CTGTATTTTT ACAGGATGGG GTCCATTTAT 
GAGATAACCG ATATACGGTT ATGAGACAGG AAGTCTCTGA CTGTGCCTGA GACATAAAAA TGTCCTACCC CAGGTAAATA 
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1441 


TATTTACAAA TTCACATATA CAACAACGCC GTCCCCCGTG CCCGCAGTTT 
ATAAATGTTT AAGTGTATAT GrTGTTGCGG CAGGGGGCAC GGGCGTCAAA 


TTATTAAACA 
AATAATTTGT 


TAGCGTGGGA 
ATCGCACCCT 


TCTCCGACAT 
AGAGGCTGTA 


1521 


CTCGGGTACG TGTTCCGGAC ATGGGCTCTT CTCCGGTAGC GGCGGAGCTT 
GAGCCCATGC ACAAGGCCTG TACCCGAGAA GAGGCCATCG CCCCCTCGAA 


CCACATCCGA 
GGTGTAGGCT 


GCCCTGGTCC 
CGGGACCAGG 


CATCCGTCCA 
GTAGGCAGGT 


1601 


GCGGCTCATG GTCGCTCGGC AGCTCCTTGC TCCTAACAGT GGAGGCCAGA 
CGCCGAGTAC CAGCGAGCCG TCGAGGAACG AGGATTGTCA CCTCCGGTCT 


CTTAGGCACA 
GAATCCGTGT 


GCACAATGCC 
CGTGTTACGG 


CACCACCACC 
GTGGTGGTGG 


1681 


AGTGTGCCGC ACAAGGCCGT GGCGGTAGGG TATGTGTCTG AAAATGAGCT CGGAGATTGG 
TCACACGGCG TGTTCCGGCA CCGCCATCCC ATACACAGAC TTTTACTCGA GCCTCTAACC 


GCTCGCACCT 
CGAGCGTGGA 


GGACGCAGAT 
CCTGCGTCTA 


1761 


GGAAGACTTA ftGGCAGCGCC AGAAGAAGAT GCAGGCAGCT GAGTTGTTGT 
CCTTCTGAAT TCCGTCGCCG TCTTCTTCTA CGTCCGTCGA CTCAACAACA 


ATTCTGATAA 
TAAGACTATT 


GAGTCAGAGG TAACTCCCGT 
CTCAGTCTCC ATTGAGGGCA 


1841 


TGCGGTGCTG TTAACGGTGG AGGGCAGTGT AGTCTGAGCA GTACTCGTTG 
ACGCCACGAC AATTGCCACC TCCCGTCACA TCAGACTCGT CATGAGCAAC 


CTGCCGCGCG 
GACGGCGCGC 


CGCCACCAGA 
GCGGTGGTCT 


CATAATAGCT 
GTATTATCGA 


+ 2 






ECORI 


M A A 


1921 


GACAGACTAA CAGACTGTTC CTTTCCATGG GTCTTTTCTG CAGTCACCGT 
CTGTCTGATT GTCTGACAAG CAAAGGTACC CAGAAAAGAC GTCAGTGGCA 


CGTCGACCTA 
GCAGCTGGAT 


AGAATTCACC 
TCTTAAGTGG 


ATGGCTGCAT 
TACCGACGTA 


+2 
2001 


YAAQ GYK VLVL NPS VAA 
ATGCAGCTCA GGGCTATAAG GTGCTAGTAC TCAACCCCTC TGTTGCTGCA 
TACGTCGAGT CCCGATATTC CACGATCATG AGTTGGGGAG ACAACGACGT 


T L G F GAY 
ACACTGGGCT TTGGTGCTTA 
TGTGACCCGA AACCACGAAT 


M S K 
CATGTCCAAG 
GTACAGGTTC 


+ 2 
2081 


AHGI DPN ERT GVRT ITT 
GCTCATGGGA TCGATCCTAA CATCAGGACC GGGGTGAGAA CAATTACCAC 
CGAGTACCCT AGCTAGGATT GTAGTCCTGG CCCCACTCTT GTTAATGGTG 


G S P 
TGGCAGCCCC 
ACCGTCGGGG 


I T Y S T y G 
ATCACGTACT CCACCTAC3G 
TAGTGCATGA GGTGGATGCC 


+ 2 
2161 


KFL ADGG CSG GAY DII3 
CAAGTTCCTT GCCGACGGCG GGTGCTCGGG GGGCGCTTAT GACATAATAA 
GTTCAAGGAA CGGCTGCCGC CCACGAGCCC CCCGCGAATA CTGTATTATT 


C C D E 
TTTGTGACGA 
AAACACTGCT 


CHS 
GTGCCACTCC 
CACGGTGAGG 


IDA 
ACGGATGCCA 
TGCCTACGGT 


+ 2 
2241 


TSIL GIG TVLO QAE TAG 
CATCCATCTT GGGCATTGGC ACTGTCCTTG ACCAAGCAGA GACTGCGGGG 
GTAGGTAGAA CCCGTAACCG TGACAGGAAC TGGTTCGTCT CTGACGCCCC 


A R L V V L A 
GCGAGACTGG TTGTGCTCGC 
CGCTCTGACC AACACGAGCG 


TAT 
CACCGCCACC 
GTGGCGGTGG 


2321 


P PGS VTV PHP MIEE VA L STT 
CCTCCGGGCT CCGTCACTGT GCCCCATCCC AACATCGAGG AGGTTGCTCT GTCCACCACC 
GGAGGCCCGA GGCAGTGACA CGGGGTAGGG TTGTAGCTCC TCCAACGAGA CAGGTGGTGG 


G E I E 
GGAGAGATCC 
CCTCTCTAGG 


' F Y G 
CTTTTTACGG 
GAAAAATGCC . 


+2 
2401 


KAI PLEV IKG GRH LIFC HSK 
CAAGGCTATC CCCCTCGAAG TAATCAAGGG GGGGAGACAT CTCATCTTCT GTCATTCAAA 
GTTCCGATAG GGGGAGCTTC ATTAGTTCCC CCCCTCTGTA GAGTAGAAGA CAGTAAGTTT 


K K C 
GAAGAAGTGC 
CTTGTTCACG 


DEL 
GACGAACTCG 
CTGCTTGAGC 


+2 
2481 


AAKL VAL GINA VAY YRG 
CCGCAAAGCT GGTCGCATTG GGCATCAATG CCGTGGCCTA CTACCGCGGT 
GGCGTTTCGA CCAGCGTAAC CCGTACTTAC GGCACCGGAT GATGGCGCCA 


L D V S VIP 
CTTGACGTGT CCGTCATCCC 
GAACTGCACA GGCAGTAGGG 


T S G 
GACCAGCGGC 
CTGGTCGCCG 


^2 
2561 


DVVV VAT DA L MTGY TGD 
GATGTTGTCG TCGTGGCAAC CGATGCCCTC ATGACCGGCT ATACCGGCGA 
CTACAACAGC AGCACCGTTG GCTACGGGAG TACTGGCCGA TATGGCCGCT 


F D S 
CTTCGACTCG 
GAAGCTGAGC 


V I D C N T C 
GTGATAGACT. GCAATACGTG 
CACTATCTGA CGTTATGCAC 
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+2 V-TQ TVDF SLD PTF TIBT ITL PQD AVS 
2641 • TGTCACCCAG ACAGTCGATT TCAGCCTTGA CCCTACCTTC ACCATTGAGA CAATCACGCT CCCCCAAGAT GCTGTCTCCC 
ACAGTGGGTC TGTCAGCTAA AGTCGGAACT GGGATGGAAG TGGTAACTCT GTTAGTGCGA GGGGGTTCTA CGACAGAGGG 



+2RTQR-RGR TGRG KPG lYR FVAP GER PSG 
2721 GCACTCAACG TCGGGGCAGG ACTGGCAGGG GGAAGCCAGG CATCTACAGA TTTGTGGCAC CGGGGGAGCG CCCCTCCGGC 
CGTGAGTTGC AGCCCCGTCC TGACCGTCCC CCTTCGGTCC GTAGATGTCT AAACACCGTG GCCCCCTCGC GGGGAGGCCG 



+ 2MF0S SVL CEC YDAG CAW YEL TPAE TTV 
2801 ATGTTCGACT CGTCCGTCCT CTGTGAGTGC TATGACGCAG GCTGTGCTTG GTATGAGCXC ACGCCCGCCG AGACTACAGT 
TACAAGCTGA GCAGGCAGGA GACACTCACG ATACTGCGTC CGACACGAAC CATACTCGAG TGCGGGCGGC TCTGATGTCA 



+2 RLR AYMN TPG LPV CQDH.LEF WEG VFT 

StuI 

2881 TAGGCTACGA GCGTACATGA ACACCCCGGG GCTTCCCGTG TGCCAGGACC ATCTTGAATT TTGGGAGGGC GTCTTTACAG 
ATCCGATGCT CGCATGTACT TGTGGGGCCC CGAAGGGCAC ACGGTCCTGG TAGAACTTAA AACCCTCCCG CAGAAATGTC 



+2GLTH IDA H FLS QTK QSG ENLP YLV AYQ 

stur 

2961 GCCTCACTCA TATAGATGCC CACTTTCTAT CCCAGACAAA GCAGAGTGGG GAGAACCTTC CTTACCTGGT AGCGTACCAA 

CGGAGTGAGT ATATCTACGG GTGAAAGATA GGGTCTGTTT CGTCTCACCC CTCTTGGAAG GAATGGACCA TCGCATGGTT 



+ 2ATVC ARA QAP PPSW DQM HKC LIRL KPT 
3041 GCCACCGTGT GCGCTAGGGC TCAAGCCCCT CCCCCATCGT GGGACCAGAT GTGGAAGTGT TTGATTCGCC TCAAGCCCAC 
CGGTGGCACA CGCGATCCCG AGTTCGGGGA GGGGGTAGCA CCCTGGTCTA CACCTTCACA AACTAAGCGG AGTTCGGGTG 



+2 LHG P TPL LYR LGA VQNE ITL THP VTK 
3121 CCTCCATGGG CCAACACCCC TGCTATACAG ACTGGGCGCT GTTCAGAATG AAATCACCCT GACGCACCCA GTCACCAAAT 
GGAGGTACCC GGTTGTGGGG ACGATATGTC TGACCCGCGA CAAGTCTTAC TTTAGTGGGA CTGCGTGGGT CAGTGGTTTA 



+2YIMT CMS A DLE VVT STW VLVG GVL AAL 
3201 ACATCATGAC ATGCATGTCG GCCGACCTGG AGGTCGTCAC GAGCACCTGG GTGCTCGTTG GCGGCGTCCT GGCTGCTTTG 
TGTAGTACTG TACGTACAGC CGGCTGGACC TCCAGCAGTG CTCGTGGACC CACGAGCAAC CGCCGCAGGA CCGACGAAAC 



+ 2AAYC LST GOV VIVG RVV LSG KPAI XPD 
3281 GCCGCGTATT GCCTGTCAAC AGGCTGCGTG GTCATAGTGG GCAGGGTCGT CTTGTCCGGG AAGCCGGCAA TCATACCTGA 
CGGCGCATAA CGGACAGTTG TCCGACGCAC CAGTATCACC CGTCCCAGCA GAACAGGCCC TTCGGCCGTT AGTATGGACT 



+ 2 REV LYRE FDE MEE CSQH LPY lEQ GMM 
3361 CAGGGAAGTC CTCTACCGAG AGTTCGATGA GATGGAAGAG TGCTCTCAGC ACTTACCGTA CATGGAGCAA GGGATGATGC 
GTCCCTTCAG GAGATGGCTC TCAAGCTACT CTACCTTCTC ACGAGAGTCG TGAATGGCAT GTAGCTCGTT CCCTACTACG 



+ 2LAEQ FKQ KAL G LLQ TAS RQAE VIA PAV 
3441 TCGCCGAGCA GTTCAAGCAG AAGGCCCTCG GCCTCCTGCA GACCGCGTCC CGTCAGGCAG AGGTTATCGC CCCTGCTGTC 
AGCGGCTCGT CAAGTTCGTC TTCCGGGAGC CGGAGGACGT CTGGCGCAGG GCAGTCCGTC TCCAATAGCG GGGACGACAG 



i-2QTN<i QKL BTF WAKH MWN FIS GIQY L A G 
3521 CAGACCAACT GGCAAAAACT CGAGACCTTC TGGGCGAAGC ATATGTGGAA CTTCATCAGT GGGATACAAT ACTTGGCGGG 
GTCTGGTTGA CCGTTTTTGA GCTCTGGAAG ACCCGCTTCG TATACACCTT GAAGTAGTCA CCCTATGTTA TGAACCGCCC 

^.2 LST LP GN PAI ASL MAFT AAV TSP L T T 
3601 CTTGTCAACG CTGCCTGGTA ACCCCGCCAT TGCTTCATTG ATGGCTTTTA CAGCTGCTGT CACCAGCCCA CTAACCACTA 
GAACAGTTGC GACGGACCAT TGGGGCGGTA ACGAAGTAAC TACCGAAAAT GTCGACGACA GTGGTCGGGT GATTGGTGAT 



*2SaTL LF .N IL GG WVA AQL AAPG AAT A J[ ^ 
3681 GCCAAACCCT CCTCTTCAAC ATATTGGGGG GGTGGGTGGC TGCCCAGCTC GCCGCCCCCG GTGCCGCTAC JGCCTTTGTjj 
CGGTTTGGGA GGAGAAGTTG TATAACCCCC CCACCCACCG ACGGGTCGAG CGGCGGGGGC CACGGCGATG ACGGAAACAC 
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+2GAGI, AGA AIG SVGL GKV LID ILAG YGA 
3761 GGCGCTGGCT TAGCTGGCGC CGCCATCGGC AGTGTTGGAC TGGGGAAGGT CCTCATAGAC ATCCTTGCAG GGTATGGCGC 
CCGCGACCGA ATCGACCGCG GCGGTAGCCG TCACAACCTG ACCCCTTCCA GGAGTATCTG TAGGAACGTC CCATACCGCG 



^2 GVA G AtV AFK IMS GEVP STE DLV NLL 
3841 GGGCGTGGCG GG AGCTCTTG TGGCATTCAA GATCATGAGC GGTGAGGTCC CCTCCACGGA GGACCTGGTC AATCTACTGC 
CCCGCACCGC CCTCGAGAAC ACCCTAAGTT CTAGTACTCG CCACTCCAGG GGAGGTGCCT CCTGGACCAG TTAGATGACG 



+ 2 PAIL SPG ALVV GVV CAA ILRR HVG PGE 
3921 CCGCCATCCT CTCGCCCGGA GCCCTCGTAG TCGGCCTGGT CTGTGCAGCA ATACTGCGCC GGCACGTTGG CCCGGGCGAG 
GGCGGTAGGA GAGCGGGCCT CGCGAGCATC AGCCGCACCA GACACGTCGT TATGACGCGG CCGTGCAACC GGGCCCGCTC 



+ 2GAVQ.WMH RLI AFAS RGN HVS PTHY VPE 
4001 GGGGCAGTGC AGTGGATCAA CCGGCTGATA GCCTTCGCCT CCCGGGGGAA CCATGTTTCC CCCACGCACT ACGTGCCGGA 
CCCCGTCACG TCACCTACTT GGCCGACTAT CGGAAGCGGA GGGCCCCCTT GGTACAAAGG GGGTGCGTGA TGCAGGGCCT 



+2 S DA AARV TAI LSS LTVT QLL RRL HQW 
4081 GAGCGATGCA GCTGCCCGCG TCACTGCCAT ACTCAGCAGC CTCACTGTAA CCCAGCTCCT GAGGCGRCTG CACCAGTGGA 
CTCGCTACGT CGACGGGCGC AGTGACGGTA TGAGTCGTCG GAGTGACATT GGGTCGAGGA CTCCGCTGAC GTGGTCACCT 



+2ISSE CTT PCSG SWL RDI WDWI CEV LSD 
4161 TAAGCTCGGA GTGTACCACT CCATGCTCCG GTtCCTGGCT AAGGGACATC TGGGACTGGA TATGCGAGGT GTTGAGCGAC 
ATTCGAGCCT CACATGGTGA GGTACGAGGC CAAGGACCGA TTCCCTGTAG ACCCTGACCT ATACGCTCCA CAACTCGCTG 



+ 2FKTW LKA KLM PQLP GIP FVS CQRG YKG 

BamHI 



4241 TTTAAGACCT GGCTAAAAGC TAAGCTCATG CCACAGCTGC CTGGGATCCC CTTTGTGTCC TGCCAGCGCG GGTATAAGGG 
AAATTCTGGA CCGATTTTCG ATTCGAGTAC GGTGTCGACG GACCCTAGGG GAAACACAGG ACGGTCGCGC CCATATTCCC 



+ 2 VWR GDGI MHT RCH CGAE ITG HVK NGT 
4321 GGTCTGGCGA GGGGACGGCA TCATGCACAC TCGCTGCCAC TGTGGAGCTG AGATCACTGG ACATGTCAAA AACGGGACGA 
CCAGACCGCT CCCCTGCCGT AGTACGTGTG AGCGACGGTG ACACCTCGAC TCTAGTGACC TGTACAGTTT TTGCCCTGCT 



+ 2MRrV GPR TCRN MWS GTF PINA YTT GPC 
4401 TGAGGATCGT CGGTCCTAGG ACCTGCAGGA ACATGTGGAG TGGGACCTTC CCCATTAATG CCTACACCAC GGGCCCCTGT 
ACTCCTAGCA GCCAGGATCC TGGACGTCCT TGTACACCTC ACCCTGGAAG GGGTAATTAC GGATGTGGTG CCCGGGGACA 



+2TPLP APN YTF ALWR VSA EEY VEIR QVG 
4 481 ACCCCCCTTC CTGCGCCGAA CTACACGTTC GCGCTATGGA GGGTGTCTGC AGAGGAATAC GTGGAGATAA GGCAGGTGGG 
TGGGGGGAAG GACGCGGCTT GATGTGCAAG CGCGATACCT CCCACAGACG TCTCCTTATG CACCTCTATT CCGTCCACCC 



t2 DFH yVTG MTT DNL KCPC QVP SPE FFT 
4561 GGACTTCCAC TACGTGACGG GTATGACTAC TGACAATCTT AAATGCCCGT GCCAGGTCCC ATCGCCCGAA TTTTTCACAG 
CCTGAAGGTG ATGCACTGCC CATACTGATG ACTGTTAGAA TTTACGGGCA CGGTCCAGGG TAGCGGGCTT AAAAAGTGTC 



f2ELDG VRL HRFA PPC KPL LREE VSF RVG 
4 641 AATTGGACGG GGTGCGCCTA CATAGGTTTG CGCCCCCCTG CAAGCCCTTG CTGCGGGAGG AGGTATCATT CAGAGTAGGA 
TTAACCTGCC CCACGCGGAT GTATCCAAAC GCGGGGGGAC GTTCGGGAAC GACGCCCTCC TCCATAGTAA GTCTCATCCT 



+2LHEY PVG SQL PCEP EPD VAV LTSM LTD 
4721 CTCCACGAAT ACCCGGTAGG GTCGCAATTA CCTTGCGAGC CCGAACCGGA CGTGGCCGTG TTGACGTCCA TGCTCACTGA 
GAGGTGCTTA TGGGCCATCC CAGCCTTAAT GGAflCGCTCG GGCTTGGCCT GCACCGGCAC AACTGCAGGT ACGAGTGACT 



+2 PSH ITAE AAG RRL ARGS PPS VAS SS A 
4801 TCCCTCCCAT ATAACAGCAG AGGCGGCCGG GCGAAGGTTG GCGAGGGGAT CACCCCCCTC TGTGGCCAGC TCCTCGGCi-A 
AGGGAGGGTA TATTGTCGTC TCCGCCGGCC CGCTTCCAAC CGCTCCCCTA CTGGGGGGAG ACACCGGTCC AGGAGCCGAT 
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+2 SQLS AP.S LKAT CTA NHO SPDA ELI EAN 
4881 GCCAGCTATC CGCTCCATCT CTCAAGGCAA CTTGCACCGC TAACCATGAC TCCCCTGATG CTGAGCTCAT AGAGGCCAAr 
CGGTCGATAG GCGAGGTAGA GAGTTCCGTT GAACGTGGCG ATTGGTACTG AGGGGACTAC GACTCGAGTA TCTCCGGTTG 



t2LLWR'QEM GGN ITRV ESE NKV VILD SFD 
4961 CTCCTATGGA GGCAGGAGAT GGGCGGCAAC ATCACCAGGG TTGAGTCAGA AAACAAAGTG GTGATTCTGG ACTCCTTCGA 
GAGGATACCT CCGTCCTCTA CCCGCCGTTG TAGTGGTCCC AACTCAGTCt TTTGTTTCAC CACTAAGACC TGAGGAAGCT 

+2 PLV AEED ERE ISV PAEI LRK SRR FA Q 
5041 TCCGCTTGTG GCGGAGGAGG ACGAGCGGGA GATCTCCGTA CCCGCAGAAA TCCTGCGGAA GTCTCGGAGA TTCGCCCAGG 
AGGCGAACAC CGCCTCCTCC TGCTCGCCCT CTAGAGGCAT GGGCGTCTTT AGGACGCCTT CAGAGCCTCT AAGCGGGTCC 

+ 2ALPV.MAR PDYN PPL VET W- KKP DYE PPV 
5121 CCCTGCCCGT TTGGGCGCGG CCGGACTATA ACCCCCCGCT AGTGGAGACG TGGAAAAAGC CCGACTACGA ACCACCTGTG 
GGGACGGGCA AACCCGCGCC GGCCTGATAT TGGGGGGCGA TCACCTCTGC ACCTTTTTCG GGCTGATGCT TGGTGGACAC 

+ 2VHGC PLP PPK SPPV PPP RKK RTVV LTE 
5201 GTCCATGGCr GCCCGCTTCC ACCTCCAAAG TCCCCTCCTG TGCCTCCGCC TCGGAAGAAG CGGACGGTGG TCCTCACTGA 
CAGGTACCGA CGGGCGAAGG TGGAGGTTTC AGGGGAGGAC ACGGAGGCGG AGCCTTCTTC GCCTGCCACC AGGAGTGACr 



+ 2 STL STAL AEL ATR SFGS SST SGI TGD 
5281 ATCAACCCTA TCTACTGCCT TGGCCGAGCT CGCCACCAGA AGCTTTGGCA GCTCCTCAAC TTCCGGCATT ACGGGCGACA 
TAGTTGGGAT AGATGACGGA ACCGGCTCGA GCGGTCGTCT TCGAAACCGT CGAGGAGTTG AAGGCCGTAA TGCCCGCTGT 



t2NTTT SSE PAPS GCP PDS DAES YSS MPP 
5361 ATACGACAAC ATCCTCTGAG CCCGCCCCTT CTGGCTGCCC CCCCGACTCC GACGCTGAGT CCTATTCCTC CATGCCCCCC 
TATGCTGTTG TAGGAGACTC GGGCGGGGAA GACCGACGGG GGGGCTGAGG CTGCGACTCA GGATAAGGAG GTACGGGGGG 



+2 LEGE PGD PDL SDGS WST VSS EANA EDV 
BamHI 



5441 CTGGAGGGGG AGCCTGGGGA TCCGGATCTT AGCGACGGGT CATGGTCAAC GGTCAGTAGT GAGGCCAACG CGGAGGATGT 
GACCTCCCCC TCGGACCCCT AGGCCTAGAA TCGCTGCCCA GTACCAGTTG CCAGTCATCA CTCCGGTTGC GCCTCCTACA 



*2 VCC SMSr SWT GAL VTPC AAE EQK LP! 
5521 CGTGTGCTGC TCAATGTCTT ACTCTTGGAC AGGCGCACTC GTCACCCCGT GCGCCGCGGA AGAACAGAAA CTGCCCATCA 
GCACACGACG AGTTACAGAA TGAGAACCTG TCCGCGTGAG CAGTGGGGCA CGCGGCGCCT TCTTGTCTTT GACGGGTAGT 



+2NALS NSL LRHH NLV YST TSRS ACQ RQK 
5601 ATGCACTAAG CAACTCGTTG CTACGTCACC ACAATTTGGT GTATTCCACC ACCTCACGCA GTGCTTGCCA AAGGCAGAAG 
TACGTGATTC GTTGAGCAAC GATGCA6TCG TGTTAAACCA CATAAGGTGC TGGAGTGCCT CACCAACCGT TTCCGTCTTC 



*2KVTF ORL QVL DSHY QOV LKE VKAA ASK 
5681 AAAGTCACAT TTGACAGACT GCAAGTTCTG GACAGCCATT ACCAGGACGT ACTCAAGGAG GTTAAAGCAG CGGCGTCAAA 
TTTCAGTGTA AACTGTCTGA CGTTCAAGAC CTGTCGGTAA TGGTCCTGCA TGAGTTCCTC CAATTTCGTC GCCGCAGTTT 



+2 VKA NLLS VEE ACS LTPP HSA KSK FGY 
5761 AGTGAAGGCT AACTTGCTAT CCGTAGAGGA AGCTTGCAGC CTGACGCCCC CACACTCAGC CAAATCCAAG TTTGGTTATG 
TCACTTCCGA TTGAACGATA GGCATCTCCT TCGAACGTCG GACTGCGGGG GTGTGAGTCG GTTTAGGTTC AAACCAATAC 



+ 2GAKD VRC HARK AVT HIN SVWK DLL EDN 
5841 GGGCAAAAGA CGTCCGTTGC CATGCCAGAA AGGCCGTAAC CCACATCAAC TCCGTGTGGA AAGACCTTCT GGAAGACAAT 
CCCGTTTTCT GCAGGCAACG GTACGGTCTT TCCGGCATTG GGTGTAGTTG AGGCACACCT TTCTGGAAGA CCTTCTGTTA 



+2VtPr OTT IMA KN EV FCV QPE KGGR KPA 
5921 GTAACACCAA TAGACACTAC CATCATGGCT AAGAACGAGG TTTTCTGCGT TCAGCCTGAG AAGGGGGGTC GTAAGCCAGC 
CATTGTGGTT ATCTGTGATG GTAGTACCGA TTCTTGCTCC AAAAGACGCA AGTCGGACTC TTCCCCCCAG CATTCGGTCG 
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+2 RLI VFPD LGV RVC EKMA LYO VVT KLP 
6001 TCGTCTCATC GTGTTCCCCG ATCTCGGCGT GCGCGTGTGC GAAAAGATGG CrTTGTACGA CGTGGTTACA AAGCTCCCCT 
AGCAGAGTAG CACAAGGGGC TAGACCCGCA CGCGCACACG CTTTTCTACC GAAACATGCT GCACCAATGX TTCGAGGGGA 



+2LAVM'GSS YGFQ YSP GQR VEfL VQA WKS 

ECORI 



6081 TGGCCGTGAT GGGAAGCTCC TACGGATTCC AATACTCACC AGGACAGCGG GTTGAATTCC TCGTGCAAGC GTGGAAGTCC 
ACCGGCACTA CCCTTCGAGG ATGCCTAAGG TTATGAG7GG TCCTGTCGCC CAACTTAAGG AGCACGTTCG CACCTTCAGG 



tZKKTP MGF SYD TRCF OST VTE SDIR TEE 
6161 AAGAAAACCC CAATGGGGTT CTCGTATGAT ACCCGCTGCT TTGACTCCAC AGTCACTGAG AGCGACATCC GTACGGAGGA 
TTCTTTTGGG GTTACCCCAA GAGCATACTA TGGGCGACGA AACTGAGGTG TCAGTGACTC TCGCTGTAGG CATGCCTCCT 



+2 AIY QCCD LDP QAR VAIK SLT ERL YVG 
6241 GGCAATCTAC CAATGTTGTG ACCTCGACCC CCAAGCCCGC GTGGCCATCA AGTCCCTCAC CGAGAGGCTT TATGTTGGGG 
CCGTTAGATG GTTACAACAC TGGAGCTGGG GGTTCGGGCG CACCGGTAGT TCAGGGAGTG GCTCTCCGAA ATACAACCCC 



+2GPLT NSR GENC GYR RCR ASGV LTT SCG 
6321 GCCCTCTTAC CAATTCAAGG GGGGAGAACT GCGGCTATCG CAGGTGCCGC GCGAGCGGCG TACTGACAAC TAGCTGTGGT 
CGGGAGAATG GTTAAGTTCC CCCCTCTTGA CGCCGATAGC GTCCACGGCG CGCTCGCCGC ATGACTGTTG ATCGACACCA 



+ 2NTLT CYI KAR AACR AAG LQD CTML VCG 
6401 AACACCCTCA CTTGCTACAT CAAGGCCCGG GCAGCCTGTC GAGCCGCAGG GCTCCAGGAC TGCACCATGC TCGTGTGTGG 
TTGTGGGAGT GAACGATGTA GTTCCGGGCC CGTCGGACAG CTCGGCGTCC CGAGGTCCTG ACGTGGTACG AGCACACACC 



+ 2 DDL VVIC ESA GVQ EDAA SLR AFT EAM 
6481 CGACGACTTA GTCGTTATCT GTGAAAGCGC GGGGGTCCAG GAGGACGCGG CGAGCCTGAG AGCCTTCACG GAGGCTATGA 
GCTGCTGAAT CAGCAATAGA CACTTTCGCG CCCCCAGGTC CTCCTGCGCC GCTCGGACTC TCGGAAGTGC CTCCGATACT 



+ 2TRYS APP GDPP QPE YDL ELIT SCS SNV 
6561 CCAGGTACTC CGCCCCCCCT GGGGACCCCC CACAACCAGA ATACGACTTG GAGCTCATAA CATCATGCTC CTCCAACGTG 
GGTCCATGAG GCGGGGGGGA CCCCTGGGGG GTGTTGGTCT TATGCTGAAC CTCGAGTATT GTAGTACGAG GAGGTTGCAC 



+ 2SVAH DGA GKR VYYL TRD PTT PLAR AAW 
6641 TCAGICGCCC ACGACGGCGC TGGAAAGAGG GtCTACTACC TCACCCGTGA CCCTACAACC CCCCTCGCGA GAGCTGCGTG 
AGTCAGCGGG TGCTGCCGCG ACCTTTCTCC CAGATGATGG AGTGGGCACT GGGATGTTGG GGGGAGCGCT CTCGACGCAC 



+ 2 ETA RHTP VNS HLG NIIM FAP TLW ARM 
6721 GGAGACAGCA AGACACACTC CAGTCAATTC CTGGCTAGGC AACATAATCA TGTTTGCCCC CACACTGTGG GCGAGGATGA 
CCTCTGTCGT TCTGTGTGAG GTCAGTTAAG GACCGAICCG TTGTATTAGT ACAAACGGGG GTGTGACACC CGCTCCTACT 



+ 2ILMT HFF SVLI ARD QLE QALD CEI YGA 
6801 TACTGATGAC CCATTTCTTT AGCGTCCTTA TAGCCAGGGA CCAGCTTGAA CAGGCCCTCG ATTGCGAGAT CTACGGGGCC 
. ATGACTACTG GGTAAAGAAA TCGCAGGAAT ATCGGTCCCT GGTCGAACTT GTCCGGGAGC TAACGCTCTA GATGCCCCGG 



+ 2CYSI EPL DLP PIIQ RLH GLS AFSL HSY 
6881 TGCTACTCCA TAGAACCACT GGATCTACCT CCAATCATTC AAAGACTCCA TGCCCTCAGC GCATTTTCAC TCCACAGTTA 
ACGATGAGGT ATCTTGGTGA CCTAGATGGA GGTTAGTAAG TTTCTGAGGT ACCGGAGTCG CGTAAAAGTG AGGTGTCAAT 



+2 SPG EINR VAA CLR KLGV PPL RAM ^^^^ 
6961 CTCTCCAGGT GAAATCAATA GGGTGGCCGC ATGCCTCAGA AAACTTGGGG TACCGCCCTT GCGAGCTTGG AGACACCGGG 
GAGAGGTCCA CTTTAGTTAT CCCACCGGCG TACGGAGTCT TTTGAACCCC ATGGCGGGAA CGCTCGAACC TCXGTGGCCC 



+2ARSV RAR LLAR GGR AAI C GKY LFN " ^ V 
7041 CCCGGAGCGT CCGCGCTAGG CTTCTGGCCA GAGGAGGCAG GGCTGCCATA TGTGGCAAGT ACCTCTTCAA CTGGGCAGTA 
GCGCCTCGCA GGCGCGATCC GAAGACCGGT CTCCTCCGTC CCGACGGTAT ACACCGTTCA TGGAGAAGTT GACCCGTCAT 
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♦2RTKL KLT PIA AAGQ LDL SGW FTAG YSG 
7121 AGAACAAAGC TCAAACTCAC TCCAATAGCG GCCGCTGGCC AGCTGGACTT GTCCGGCTGG TTCACGGCTG GCTACAGCGG 
TCTTGTTTCG AGTTTGAGTG AGGTTATCGC CGGCGACCGG TCGACCTGAA CAGGCCGACC AAGTGCCGAC CGATGTCGCC 



+2 GDI Y HSV SHA RPR WIWF CL L LLA AGV 
7201 GGGAGACATT TATCACAGCG TGTCTCATGC CCGGCCCCGC TGGATCTGGT TTTGCCTACT CCTGCTTGCT GCAGGGGTAG 
CCCTCtGTAA ATAGTGTCGC ACAGAGTACG GGCCGGGGCG ACCTAGACCA AAACGGATGA GGACGAACGA CGTCCCCATC 



+2GIYL LPN R 
7281 GCATCTACCT CCTCCCCAAC CGATGAAGGT TGGGGTAAAC ACTCCGGCCT AAAAAAAAAA AAAAATCTAG AAAGGCGCGC 
CGTAGATGGA GGAGGGGTTG GCTACTTCCA ACCCCATTTG TGAGGCCGGA TTTTTTTTTT TTTTTAGATC TTTCCGCGCG 



^BamHI MXuI 

7 361 CAAGATATCA AGGATCCACT ACGCGTTAGA GCTCGCTGAT CAGCCTCGAC TGTGCCTTCT AGTTGCCAGC CATCTGTTGT 
GTTCTATAGT TCCTAGGTGA TGCGCAATCT CGAGCGACTA GTCGGAGCTG ACACGGAAGA TCAACGGTCG GTAGACAACA 

7441 TTGCCCCTCC CCCGTGCCTt CCTTGACCCT GGAAGGTGCC ACTCCCACTG TCCTTTCCTA ATAAAATGAG GAAATTGCAT 
AACGGGGAGG GGGCACGGAA GGAACTGGGA CCTTCCACGG TGAGGGTGAC AGGAAAGGAT TATTTTACTC CTTTAACGTA 

7 521 CGCATTGTCT GAGTAGGTGT CATTCTATTC TGGGGGGTGG GGTGGGGCAG GACAGCAAGG GGGAGGATTG GGAAGACAAT 
GCGTAACAGA CTCATCCACA GTAAGATAAG ACCCCCCACC CCACCCCGTC CTGTCGTTCC CCCtCCTAAC CCTTCTGTTA 

7 601 AGCAGGCATG CTGGGGAGCT CTTCCGCTTC CTCGCTCACT GACTCGCTGC GCTCGGTCGT TCGGCTGCGG CGAGCGGTAT 
TCGTCCGTAC GACCCCTCGA GAAGGCGAAG GAGCGAGTGA CTGAGCGACG CGAGCCAGCA AGCCGACGCC GCTCGCCATA 

7681 CAGCTCACTC AAAGGCGGTA ATACGGTTAT CCACAGAATC AGGGGATAAC GCAGGAAAGA ACATGTGAGC AAAAGGCCAG 
GTCGAGTGAG TTTCCGCCAT TATGCCAATA GGTGTCTTAG TCCCCTATTG CGTCCTTTCT TGTACACTCG TTTTCCGGTC 

7761 CAAAAGGCCA GGAACCGTAA AAAGGCCGCG TTGCTGGCGT TTTTCCATAG GCTCCGCCCC CCTGACGAGC ATGACAAAAA 
GTTTTCCGGT CCTTGGCATT TTTCCGGCGC AACGACCGCA AAAAGGTATC CGAGGCGGGG GGACTGCTCG TAGTGTTTTT 



7 541 TCGACGCTCA AGTCAGAGGT GGCGAAACCC GACAGGACTA TAAAGATACC AGGCGTTTCC CCCTGGAAGC TCCCTCGTGC 
ACCTGCGAGT TCAGTCTCCA CCGCTTTGGG CTGTCCTGAT ATTTCTATGG TCCGCAAAGG GGGACCTTCG AGGGAGCACG 

7921 GCTCTCCTGT TCCGACCCTG CCGCTTACCG GATACCTGTC CGCCTTTCTC CCTTCGGGAA GCGTGGCGCT TTCTCAATGC 
CGAGAGGACA AGGCTGGGAC GGCGAATGGC CTATGGACAG GCGGAAAGAG GGAAGCCCTT CGCACCGCGA AAGAGTTACG 

8001 TCACGCTGTA GGTATCTCAG TTCGGTGTAG GTCGTTCGCT CCAAGCTGGG CTGTGTGCAC GAACCCCCCG TTCAGCCCGA 
AGTGCGACAT CCATAGAGTC AAGCCACATC CAGCAAGCGA GGTTCGACCC GACACACGTG CTTGGGGGGC AAGTCGGGCT 

8081 CCGCTGCGCC TTATCCGGTA ACTATCGTCT TGAGTCCAAC CCGGTAAGAC ACGACTTATC GCCACTGGCA GCAGCCACTG 
GGCGACGCGG AATAGGCCAT TGATAGCAGA ACTCAGGTTG GGCCATTCTG TGCTGAATAG CGGTGACCGT CGTCGGTGAC 

8161 GTAACAGGAT TAGCAGAGCG AGGTATGTAG GCGGTGCTAC AGAGTTCTTG AAGTGGTGGC CTAACTACGG CTACACTAGA 
CATTGTCCTA ATCGTCTCGC TCCATACATC CGCCACGATG TCTCAAGAAC TTCACCACCG GATTGATGCC GATGTGATCT 

8241 AGGACAGTAT TTGGTATCTG CCCTCTGCTG AAGCCAGTTA CCTTCGGAAA AAGAGTTGGT AGCTCTTGAT CCGGCAAACA 
TCCTGTCATA AACCATAGAC GCGAGACGAC TTCGGTCAAT GGAAGCCTTT TTCTCAACCA TCGAGAACTA GGCCGTTTGT 

8321 AACCACCGCT GGTAGCGGTG GTTTTTTTGT TTGCAAGCAG CAGATTACGC GCAGAAAAAA AGGATCTCAA GAAGATCCTT 
TTGGTGGCGA CCATCGCCAC CAAAAAAACA AACGTTCGTC GTCTAATGCG CGTCTTTTTT TCCTAGAGTT CTTCTAGGAA 



8401 TGATCTTTTC TACGGGGTCT GACGCTCAGT GGAACGAAAA CTCACGTTAA GGGATTTTGG TCATGAGATT" ATCAAA?AG^ 
ACTAGAAAAG ATGCCCCAGA CTGC6AGTCA CCTTGCTTTT GAGTGCAATT CCCTAAAACC AGTACTCTAA TAGTTTTTCC 
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8481 


ATCTTCACCT AGATCCTTTT AAATTAAAAA TGAAGTTTTA AATCAATCTA AAGTATATAT GAGTAAACTT GGTCTGACAG 
TAGAACTGGA TCTAGGAAAA TTTAATTTTT ACTTCAAAAT TTAGTTAGAT TTCATATATA CTCATTTGAA CCAGACTGTC 


8561 


TTACCAAT^ TTAATCAGTG AGGCACCTAT CTCAGCCiATC TGTCTATTTC GTTCATCCAT AGTTGCCTGA CTCCCCGTCG 
AATGGTTACG AATTAGTCAC TCCGTGGATA GAGTCGCTAG ACAGATAAAG CAAGTAGGTA TCAACGGACT GAGGGGCAGC 


8641 


TGTAGATAAC TACGATACGG GAGGGCTTAC CATCTGGCCC CAGTGCTGCA ATGATACCGC GA6ACCCACG 
ACATCTATTG ATGCTATGCC CTCCCGAATG GTAGACCGGG GTCACGACGT TACTATGGCG CTCTGGGTGC 


CTCACCGGCT 
GAGTGGCCGA 


8721 


CCAGATTTAT CAGCAATAAA CCAGCCAGCC GGAAGGGCCG AGCGCAGAAG TGGTCCTGCA ACTTTATCCG 
GGTCTAAATA GTCGTTATTT GGTCGGTCGG CCTTCCCGGC TCGCGTCTTC ACCAGGACGT TGAAATAGGC 


CCTCCATCCA 
GGAGGTAGGT 


8801 


GTCTATTAAT TGTTGCCGGG AAGCTAGAGT AAGTAGTTCG CCAGTTAATA GTTTGCGCAA CGTTCTTGCC 
CAGATAATTA ACAACGGCCC TTCGATCTCA TTCATCAAGC GGTCAATTAT CAAACGCGTT GCAACAACGG 


ATTGCTACAG 
TAACGATGTC 


8881 


GCATCGTGGT GTCACGCTCG TCGTTTGGTA TGGCTTCATT CAGCTCCGGT TCCCAACGAT CAAGGCGAGT TACATGATCC 
CGTAGCACCA CAGTGCGAGC AGCAAACCAT ACCGAAGTAA GTCGAGGCCA AGGGTTGCTA GTTCCGCTCA ATGTACTAGG 


8961 


CCCATGTTGT GCAAAAAAGC GGTTAGCTCC TTCGGTCCTC CGATCGTTGT CAGAAGTAAG TTGGCCGCAG 
GGGTACAACA CGTTTTTTCG CCAATCGAGG AAGCCAGGAG GCTAGCAACA GTCTTCATTC AACCGGCGTC 


TGTTATCACT 
ACAATAGTGA 


9041 


CATGGTTATG GCAGCACTGC ATAATTCTCT TACTGTCATG CCATCCGTAA GATGCTTTTC TGTGACTGGT 
GTACCAATAC CGTCGTGACG TATTAAGAGA ATGACAGTAC GGTAGGCATT CTACGAAAAG ACACTGACCA 


GAGTACTCAA 
CTCATGAGTT 


9121 


CCAAGTCATT CTGA6AATAG TGTATGCGGC GACCGAGTTG CTCTTGCCCG GCGTCAATAC GGGATAATAC 
GGTTCAGTAA GACTCTTATC ACATACGCCG CTGGCTCAAC GAGAACGGGC CGCAGTTATG CCCTATTATG 


CGCGCCACAT 
GCGCGGTGTA 


9201 


AGCAGAACTT TAAAAGTGCT CATCATTGGA AAACGTTCTT CGGGGCGAAA ACTCTCAAGG ATCTTACCGC 
TCGTCTTGAA ATTTTCACGA GTAGTAACCT TTTGCAAGAA GCCCCGCTTT TGAGAGTTCC TAGAATGGCG 


TGTTGAGATC 
ACAACTCTAG 


9281 


CAGTTCGATG TAACCCACTC GTGCACCCAA CTGATCTTCA GCATCTTTTA CTTTCACCAG CGTTTCTGGG 
GTCAAGCTAC ATTGGGTGAG CACGTGGGTT GACTAGAAGT CGTAGAAAAT GAAAGTGGTC GCAAAGACCC 


TGAGCAAAAA 
ACTCGTTTTT 


9361 


CAGGAAGGCA AAATGCCGCA AAAAAGGGAA TAAGGGCGAC ACGGAAATGT TGAATACTCA TACTCTTCCT 
GTCCTTCCGT TTTACGGCGT TTTTTCCCTT ATTCCCGCTG TGCCTTTACA ACTTATGAGT ATGAGAAGGA 


TTTTCAATAT 
AAAAGTTATA 


9441 


TATTGAAGCA TTTATCAGGG TTATTGTCTC ATGAGCGGAT ACATATTTGA ATGTATTTAG AAAAATAAAC AAATAGGGGT 
ATAACTTCGT AAATAGTCCC AATAACAGAG TACTCGCCTA TCTATAAACT TACATAAATC TTTTTATTTG TTTATCCCCA 


9521 


TCCGCGCACA TTTCCCCGAA AAGTGCCACC TGACGTCXAA GAAACCATTA TTATCATGAC ATTAACCTAT 
AGGCGCGTGT AAAGGGGCTT TTCACGGTGG ACTGCAGATT CTTTGGTAAT AATAGTACTG TAATTGGATA 


AAAAATAGGC 
TTTTTATCCG 


9601 


GTATCACGAG GCCCTTTCGT C 
CATAGTGCTC CGGGAAAGCA G 
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1 TCGCGCGTTT CGGTGATGAC GGTGAAAACC TCTGACACAT GCAGCTCCCG GAGACGGTCA CAGCTTGTCT GlTkAGCGGAT 
AGCGCGCAAA GCCACTACTG CCACTTTTGG AGACTGtGTA CGTCGAGGGC CTCTCCCAGT GTCGAACAGA CATTCGCCTA 



81 


GCCGGGAGCA GACAAGCCCG 
CGGCCCTCGT CTGTTCGGGC 


TCAGGGCGCG 
AGTCCCGCGC 


TCAGCGGGTG 
AGTCGCCCAC 


TTGGCGGGTG 
AACCGCCCAC 


TCGGGGCTGG 
AGCCCCGACC 


CTTAACTATG 
GAATTGATAC 


CGGCATCAGA 
GCCGTAGTCT 


161 


GCAGATTGTA 
CGTCTAACAT 


CTGAGAGTGC 
GACTCTCACG 


ACCATATGAA 
TGGTATACTT 


StuI 

GCTTTTTGCA AAAGCCTAGG CCTCCAAAAA 
CGAAAAACGT TTTCGGATCC GGAGGTTTTT 


AGCCTCCTCA 
TCGGAGGAGT 


CTACTTCTGG 
GATGAAGACC 




AATAGCTCAG 
TTATCGAGTC 


AGGCCGAGGC 
TCCGGCTCCG 


GGCCTCGGCC 
CCGGAGCCGG 


TCTGCATAAA 
AGACGTATTT 


TAAAAAAAAT 
nlliliiilM 


TAGTCAGCCA 


TGGGGCGGAG 
ACCCCGCCTC 


AATGGGCGGA 
TTACCCGCCT 




ACTGGGCGGG 
TGACCCGCCC 


GAGGGAATTA 
CTCCCTTAAT 


TrCGCTATTG 
AACCGATAAC 


GCCATTGCAT 
CGGTAACGTA 


ACGTTGTATC 
TGCAACATAG 


TATATCATAA 


TATGTACATT 
ATACATGTAA 


TATATTGGCr 
ATATAACCGA 




CATGTCCAAT 
GTACAGGTTA 


ATGACCGCCA 
TACTGGCGGT 


TGTTGACATT GATTATTGAC 
ACAACTGTAA CTAATAACTG 


TAGTTATTAA 
ATCAATAATT 


TAGTAATCAA 

n L \,t\L InKj L X 


TTACGGGGTC 
AATGCCCCAG 


ATTAGTTCAT 
TAATCAAGTA 


AQ1 


AGCCCATATA 
TCGGGTATAT 


TGGAGTTCCG 
ACCTCAAGGC 


CGTTACATAA 
GCAATGTATT 


CTTACGGTAA 
GAATGCCATt 


ATGGCCCGCC 
TACCGGGCGG 


TGGCTGACCG 


CCCAACCACC 
GGGTTGCTGG 


CCCGCCCATT 
GGGCGGGtAA 




GACGTCAATA 
CTGCAGTTAT 


ATGACGTATG 
TACTGCATAC 


TTCCCATAGT 
AAGGGTATCA 


AACGCCAATA 
TTGCGGTTAT 


GGGACTTTCC 
CCCTGAAAGG 


ATTGACGTCA 
TAACTGCAGT 


ATGGGTGGAG 
TACCCACCTC 


TATTTACGGT 
ATAAATGCCA 




AAACTGCCCA 
TTTGACGGGT 


CTTGGCAGTA 
GAACCGTCAT 


CATCAAGTGT 
GTAGTTCACA 


ATCATATGCC 
TAGTATACGG 


AAGTCCGCCC 
TTCAGGCGGG 


CCTATTGACG 
GGATAACTGC 


T CAATGACGG 
AGTTACTGCC 


TAAATGGCCC 
ATTTACCGGG 


TOT 


GCCTGGCATT 
CGGACCGTAA 


ATGCCCAGTA 
TACGGGTCAT 


CATGACCTTA 
GTACTGGAAT 


CGGGACTTTC 
GCCCTGAAAG 


CTACTTGGCA 
GATGAACCGT 


GTACATCTAC 
CATGTAGATG 


GTATTAGI CA 
CATAATCAGT 


TCGCTATTAv. 
AGCGATAATG 


an 1 


CATGGTGATG 
GTACCACTAC 


CGGTTTTGGC 
GCCAAAACCG 


AGTACACCAA 
TCATGTGGTT 


TGGGCGTGGA 
ACCCGCACC7 


TAGCGGTTTG 
ATCGCCAAAC 


ACTCACGGGG 
rGAGTGCCCC 


ATTTCCAAGT 
TAAAGGTTCA 


GAGGTGGGGT 


Rfl 1 


TTGACGTCAA 
AACTGCAGTT 


TGGGAGTTTG 
ACCCTCAAAC 


TTTTGGCACC 
AAAACCGTGG 


AAAATCAACG 
TTTTAGTTGC 


GGACTTTCCA 
CCTGAAAGGT 


AAATGTCGTA 
TTTACAGCAT 


ATAACCCCGC 
TATTGGGGCG 


GGGCAACTGC 


961 


CAAATGGGCG 
GTTTACCCGC 


GTAGGCGTGT 
CATCCGCACA 


ACGGTGGGAG 
TGCCACCCTC 


GTCTATATAA GCAGAGCTCG 
CAGATATATT CGTCTCGAGC 


TTTAGTGAAC 
AAATCACTTG 


CGTCAGATCG 
GCAGTCTRGC 


CCTGGAGACG 
GGACCTCTGC 


1041 


CCATCCACGC 
GGTAGGTGCG 


TGTTTTGACC 
ACAAAACTGG 


TCCATACAAG 
AGGTATCTTC 


ACACCGGGAC 
TGTGGCCCTG 


CGATCCAGCC 
GCTAGGTCGG 


TCCGCGGCCG 
AGGCGCCGGC 


GGAACGGTGC 
CCTTGCCACG 


ATTGGAACGC 
TAACCTTGCG 


1121 


GGATTCCCCG 
CCTAAGGGGC 


TGCCAAGAGT 
ACGGTTCTCA 


GACGTAAGTA CCGCCTATAG 
CTGCATTCAT GGCGGAXATC 


ACTCTATAGG 
TGAGATATCC 


CACACCCCTT 
GTGTGGGGAA 


TGGCTCTTAT 
ACCGAGAATA 


GCATGCTATA 
CGTACGATAT 


1201 


CTGTTTTTGG 
GACAAAAACC 


CTTGGGGCCT 
GAACCCCGGA 


ATACACCCCC 
TATGTGGGGG 


GCTCCTTATG 
CGAGGAATAC 


CTATAGGTGA 
GATATCCACT 


TGGTATAGCT TAGCCTATAG 
ACCATATCGA ATCGGATATC 


GTGTGGGTTA 
CACACCCAAT 


1281 


TTGACCATTA TTGACCACTC 
AACTGGTAAr AACTGGTGAG 


CCCTATTGGT GACGATACTT TCCATTACTA 
GGGATAACCA CTGCTATGAA AGGTAATGAT 


ATCCATAACA TGGCTCTTTG 
TAGGTATTGT ACCGAGAAAC 


CCACAACTAT 
GGTGTTGATA 


1361 


CTCrATTGGC 
GAGATAACCG 


TATATGCCAA 
ATATACGGTT 


TACTCTGTCC TTCAGAGACT 
ATGAGACAGG AAGTCTCTGA 


GACACGGACr 
CTGTGCCTGA 


CTGTATTTTT 
GACATAAAAA 


ACAGGATGGG 
TGTCCTACCC 


GTCCATTTAT 
CAGGTAAATA 
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1441 TATTTACAAA TTCACATATA CAACAACGCC GTCCCCCGTG CCCGCAGTTT TTATTAAACA TAGCGTGGGA TCTCCGACAT 
ATAAATGTTT AAGTGTATAT GTTGTTGCGG CAGGGGGCAC GGGCGTCAAA AATAATTTGT ATCGCACCCT AGAGGCTGTA 



1321 CTCGGGTACG TGTTCCGGAC ATGGGCTCTT CTCCGGTAGC GGCGGAGCTT CCACATCCGA GCCCTGGTCC CATCCGTCCA 
GAGCCCATGC ACAAGGCCTG TACCCGAGAA GAGGCCATCG CCGCCTCGAA GGTGTAGGCT CGGGACCAGG GTAGGCAGGT 



1601 GCGGCTCATG GTCGCTCGGC AGCTCCTTGC TCCTAACAGT GGAGGCCAGA CTTAGGCACA GCACAATGCC CACCACCACC 
CGCCGAGTAC CAGCGAGCCG TCGAGGAACG AGGATTGTCA CCTCCGGTCT GAATCCGTGT CGTGTTACGG GTGGTGGTGG 



1681 AGTGTGCCGC ACAAGGCCGT GGCGGTAGGG TATGTGTCTG AAAATGAGCT CGGAGATTGG GCTCGCACCT GGACGCAGAT 
TCACACGGCG TGTTCCGGCA CCGCCATCCC ATACACAGAC TTTTACTCGA GCCTCTAACC CGAGCGTGGA CCTGCGTCTA 



1761 GGAAGACTTA AGGCAGCGGC AGAAGAAGAT GCAGGCAGCT GAGTTGTTGT ATTCTGATAA GAGTCAGAGG TAACTCCCGT 
CCTTCTGAAT TCCGTCGCCG TCTTCTTCTA CGTCCGTCGA CTCAACAACA TAAGACTATT CTCAGTCTCC ATTGAGGGCA 



1841 TGCGGTGCTG TTAACGGTGG AGGGCAGTGT AGTCTGAGCA GTACTCGTTG CTGCCGCGCG CGCCACCAGA CATAATAGCT 
ACGCCACGAC AATTGCCACC TCCCGTCACA TCAGACTCGT CATGAGCAAC GACGGCGCGC GCGGTGGTCT GTATTATCGA 



+2 . M A A 

. EcoRI 



1921 GACAGACTAA CAGACTGTTC CTTTCCATGG GTCTTTTCTG CAGTCACCGT CGTCGACCTA AGAATTCACC ATGGCTGCAT 
CTGTCTGATT GTCTGACAAG GAAAGGTACC CAGAAAAGAC GTCAGTGGCA GCAGCTGGAT TCTTAAGrCG TACCGACGTA 



^•ZYAAQ GYK VLVL MPS VAA TLGF GAY MSK 
2001 ATGCAGCTCA GGGCTATAAG GTGCTAGTAC TCAACCCCTC TGTTGCTGCA ACACTGGGCT TTGGTGCTTA CATGTCCAAG 
TACGTCGAGT CCCGATATTC CACGATCATG AGTTGGGGAG ACAACGACGT TGTGACCCGA AACCACGAAT GTACAGGTTC 



+2AHGI DPN IRT GVRT ITT GSP ITYS TYG 
2081 GCTCATGGGA TCGATCCTAA CATCAGGACC GGGGIGAGAA CAATTACCAC TGGCAGCCCC ATCACGTACT CCACCTACGG 
CGAGTACCCT AGCTAGGATT GTAGTCCTGG CCCCACTCTT GTTAATGGTG ACCGTCGGGG TAGTGCATGA GGTGGATGCC 



+2 KFL ADGG CSG GAY Dili CDE CHS TDA 
2161 CAAGTTCCIT GCCGACGGCG GGTGCTCGGG GGGCGCTTAT GACATAATAA TTTGTGACGA GTGCCACTCC ACGGATGCCA 
GITCAAGGAA CGGCTGCCGC CCACGAGCCC CCCGCGAATA CTGTATTATT AAACACTGCT CACGGTGAGG TGCCTACGGT 



+2TSrL GIG TVLO QAE TAG ARLV VLA TAT 
2241 CATCCATCTT GGGCATTGGC ACTGTCCTTG ACCAAGCAGA GACTGCGGGG GCGAGACTGG' TTGTGCTCGC CACCGCCACC 
GTAGGTAGAA CCCGTAACC6 TGACAGGAAC TGGTTCGTCT CTGACGCCCC CGCTCTGACC AACACGAGCG GTGGCGGTGG 



+2PPGS VTV PHP NIEE VAL STT GEIP FYG 
2321 CCTCCGGGCT CCGTCACTGT GCCCCATCCC AACATCGAGG AGGTTGCTCT GTCCACCACC GGAGAGATCC CTTTTTACGG 
GGAGGCCCGA GGCAGTGACA CGGGGTAGGG TTGTAGCTCC TCCAACGAGA CACGTGGTGG CCTCTCTAGG GAAAAATGCC 



+2 KAI PLEV IKG GRH LIFC HSK KKC DEL 
2401 CAAGGCTATC CCCCTCGAAG TAATCAAGGG GGGGAGACAT CTCATCTTCT GTCATTCAAA GAAGAAGTGC GACGAACTC3 
GTTCCGATAG GGGGAGCTTC ATTACTTCCC CCCCTCTGTA GAGTAGAAGA CAGTAAGTTT CTTCTTCACG CTGCTTGAGC 



+ 2AAKL VAL GINA VAY YRG LDVS VIP TSG 
2481 CCGCAAAGCT GGTCGCATTG GGCATCAATG CCGTGGCCTA CTACCGCGGT CTTGACGTGT CCGTCATCCC GACCAGCGGC 
GGCGTTTCGA CCAGCGTAAC CCGTAGTTAC GGCACCGGAT GATGGCGCCA GAACTGCACA GGCAGTAGGG CTGGTCGCCG 



+2DVVV VAT DAL MTGY TGD T 0 S VIDC NTC 
2561 GATGTTGTCG TCGTGGCAAC CGATGCCCTC ATGACCGGCT ATACCGGCGA CTTCGACTCG GTGATAGACT GCAATACGTG 
CTACAACAGC AGCACCGTTG GCTACGGGAG TACTGGCCGA TATGGCCGCT GAAGCTGAGC CACTATCTGA CGTTATGCAC 
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+2 VTQ TVOF SLD PTF TIET ITL PQD AVS 
2641 TGTCACCCAG ACAGTCGATT TCAGCCTTGA CCCTACCTTC ACCATTGAGA CAATCACGCT CCCCCAAGAT GCTGTCTCCC 
ACAGTGGGTC TGTCAGCTAA AGTCGGAACT GGGATCGAAG TGGTAACTCT GTTAGTGCGA GGGGGTTCTA CGACAGAGGG 



+2RTaR -RGR TGRG KPG lYR FVAP GER PSG 
272 X GCACTCAACG TCGGGGCAGG ACTGGCAGGG GGAAGCCAGG CATCTACAGA TTTGTGGCAC CGGGGGAGCG CCCCTCCGGC 
CGTGAGTTGC AGCCCCGTCC TGACCGTCCC CCTTCGGTCC GTAGATGTCT AAACACCGTG GCCCCCTCGC GGGGAGGCCG 



+ 2ME"DS SVL CEC YDAG CAW YEL TPAE TTV 
2801 ATGTTCGACT CGTCCGTCCT CTGTGAGTGC TATGACGCAG GCTGTGCTTG GTATGAGCTC ACGCCCGCCG AGACTACAGT 
TACAAGCTGA GCAGGCAGGA GACACTCACG ATACTGCGTC CGACACGAAC CATACTCGAG TGCGGGCGGC TCTGATGTCA 



+ 2 RLR AYMN TPG LPV CQDH.LEF WEG VFT 

StuI 

2881 TAGGCTACGA GCGTACATGA ACACCCCGGG GCTTCCCGTG TGCCAGGACC ATCTTGAATT TTGGGAGGGC GTCTTTACAG 
ATCCGATGCT CGCATGTACT TGTGGGGCCC CGAAGGGCAC ACGGTCCTGG TAGAACTTAA AACCCTCCCG CAGAAATGTC 



+2GLTH IDA HFLS QTK QSG ENLP YLV AYQ 
StuI 

2961 GCCTCACTCA TATAGATGCC CACTTTCTAT CCCAGACAAA GCAGAGTGGG GAGAACCTTC CTTACCIGGT AGCGTACCAA 
CGGAGTGAGT ATATCTACGG GTGAAAGATA GGGTCTGTTT CGTCTCACCC CTCTTGGAAG GAATGGACCA TCGCATGGTT 



+2ATVC ARA QAP PPSW OQM WKC LIRL KPT 
3041 GCCACCGTGT GCGCTAGGGC TCAAGCCCCT CCCCCATCGT GGGACCAGAT GTGGAAGTGT TTGATTCGCC TCAAGCCCAC 
CGGTGGCACA CGCGATCCCG AGTTCGGGGA GGGGGTAGCA CCCTGGTCTA CACCTTCACA AACTAAGCGG AGTTCGGGTG 



+ 2 LHG PTPL LYR LGA VQNE ITL THP VTK 
3121 CCTCCATGGG CCAACACCCC TGCTATACAG ACTGGGCGCT GTTCAGAATG AAATCACCCT GACGCACCCA GTCACCAAAT 
GGAGGTACCC GGTTGTGGGG ACGATATGTC TGACCCGCGA CAAGTCTTAC TTTAGTGGGA CTGCGTGGGT CAGTGGTTTA 



+ 2Y1MT CMS ADLE VVT STW VLVG GVL AAL 
3201 ACATCATGAC ATGCATGTCG GCCGACCTGG AGGTCGTCAC GAGCACCTGG GTGCTCGTTG GCGGCGTCCT GGCTGCTTTG 
TGTAGTACTG TACGTACAGC CGGCTGGACC TCCAGCAGTG CTCGTGGACC CACGAGCAAC CGCCGCAGGA CCGACGAAAC 



+ 2AAYC LST GCV VIVG RVV LSG KPAI IPD 
3281 GCCGCGTATT GCCTGTCAAC AGGCTGCGTG GTCATAGTGG GCAGGGTCGT CETGTCCGGG AAGCCGGCAA TCATACCTGA 
CGGCGCATAA CGGACAGTTG TCCGACGCAC CAGTA1CACC CGTCCCAGCA GAACAGGCCC TTCGGCCGTT AGTATGGACT 



+ 2 REV LYRE FDE KEE CSQH LPY lEQ GMM 
3361 CAGGGAAGTC CTCTACCGAG AGTTCGATGA GATGGAAGAG TGCTCtCAGC ACTTACCGTA CATCGAGCAA GGGATGATGC 
GTCCCTTCAG GAGATGGCTC TCAAGCTACT CTACCTTCTC ACGAGAGTCG TGAATGGCAT GTAGCTCGTT CCCTACTACG 



+2LAEQ FKQ KALG LLQ TAS RQAE VIA v P A V 
3441 TCGCCGAGCA GTTCAAGCAG AAGGCCCTCG GCCTCCTGCA GACCGCGTCC CGTCAGGCAG AGGTTATCGC CCCTGCTGTC 
AGCGGCTCGT CAAGTTCGTC TTCCGGGAGC CGGAGGACGT CTGGCGCAGG GCAGTCCGTC TCCAATAGCG GGGACGACAG 



+2QTNW QKL ETF WAKH MWN FIS GIQY LAG 
3521 CAGACCAACT GGCAAAAACT CGAGACCTTC TGGGCGAAGC ATATGTGGAA CTTCATCAGT GGGATACAAT ACTTGGCGGG 
GTCTGGTTGA CCGTTTTTGA GCTCTGGAAG ACCCGCTTCG TATACACCTT GAAGTAGTCA CCCTATGTTA TGAACCGCCC 



+2 LST LPGN PAI ASL MAFT AAV TSP LTT 
3601 CTTGTCAACG CTGCCTGGTA ACCCCGCCAT TGCTTCATTG ATGGCTTTTA CAGCTGCTGT CACCAGCCCA CTAACCACTA 
GAACAGTTGC GACGGACCAT TGGGGCGGTA ACGAAGTAAC TACCGAAAAT GTCGACGACA GTGGTCGGGT .GATTGGTGAT 



+2SQTL LFN ILGG WVA AQL AAPG AAT AFV 
3681 GCCAAACCCT CCTCTTCAAC ATATTGGGGG GGTGGGTGGC TGCCCAGCTC GCCGCCCCCG GTGCCGCTAC TGCCTTTGTG 
CGGTTTCGGA GGAGAAGTTG TATAACCCCC CCACCCACCG ACGGGTCGAG CGGCGGGGGC CACGGCGATG ACGGAAACAC 
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-i-^^t^ ^5^^^ AGA A IG, SVGL GKV LID ILAG rra 
3761 . GCCGCTGGCT TAGCTGGCGC CGCCATCGGC AGTGTTGGAC TGGGGAAGGr CCTCATAGAC ATCCTTGCAG GGTATGGCGr 
CCGCGACCGA ATCGACCGCG GCGGTAGCCG TCACAACCTG ACCCCTTCCA GGAGTATCTG TAGGAACGTC CCATACCGCG 



..5^'^ G ALV AFK IMS GEVP STE DLV Mll 

GGAGCTCTTG TGGCATTCAA GATCATGAGC GGTGAGGTCC CCTCCACGGA GGACCTGGTC AATCTACTcr 
CCCGCACCGC CCTCGAGAAC ACCGTAAGTT CTAGTACTCG CCACTCCAGG GGAGGTGCCT CCTGGACCAG TTAGATGACG 



+2 PAIL SPG ALVV G V V CAA ILRR HVG Pre 
^^^^ ^i^^^^il^^I CTCGCCCGGA GCCCTCGTAG TCGGCGTGGT CTGTGCAGCA ATACTGCGCC GGCACGTTGG CCCGGGCGAG 
GGCGGTAGGA GAGCGGGCCT CGGGAGCATC AG CCGCACCA GACACGTCGT TATCACGCGG CCGTGCAACC GGGCCC^^TC 

GAVQ WMN RLI APAS RGN HVS PTHY VPr 
^^^^ ^^^a^l^^ t^}^^^'^^^ CCGGCTGATA GCCTTCGCCT CCCGGGGGAA CCATGTTTCC CCCACGCACT ACGTGCCGGA 
CCCCGTCACG TCACCTA CTT GGCCGACTAT CGGAAGCGGA GGGCCCCCTT GGTACAAAGG GGGTGCGTGA TGCACGGCCT 

*2 SDA AARV TAI LSS LTVT QLL RRL HOW 
55555^^^^^ GCTGCCCGCG rCACTGCCAT ACTCAGCAGC CTCACTGTAA CCCAGCTCCT GAGGCGACTG CACCAGTGGA 
CTCGCTACGT CGACGGGCGC AGTGAC GGTA TGAGTCGTCG GAGTGACATT GGGTCGAGGA CTCCGCTGAC GTGGTCACCT 

-^2IS3E CTT PCSG SWL RDI WDWI CEV LSD 
4161 TAAGCTCGGA GTGTACCACT CCATGCTCCG GTTCCTGGCT AAGGGACATC TGGGACTGGA TATGCGAGGT GTTGAGCGAC 
ATTCGAGCCT CACATGGTGA GGTACGAGGC CAAGGACCGA TTCCCTGTAG ACCCTGACCT ATACGCTCCA CAACTCGCTG 

+2FKTW LKA KLM POLP GIP TVS CQRG YKG 

BaictHI 

1241 TTTAAGACCT GGCTAAAAGC TAAGCTCATG CCACAGCTGC CTGGGATCCC CTTTGTGTCC TGCCAGCGCG GGTATAAGGG 
AAATTCTGGA CCGATTT TCG ATTCGAGTAC GGTGTCGACG GACCCTAGGG GAAACACAGG ACGGTCGCGC CCATATTCCC 

.-»ot^ " ^ GDGr MHT RCH CGAE tTG HVK NGT 

4321 GGTCTGGCGA GGGGACGGCA TCATGCACAC TCGCTGCCAC TGTGGAGCTG AGATCACTGG ACATGTCAAA AACGGGACGA 
CGAGACCGCT CCCCTGCCGT AGTACGTGTG AGCGACGGTG ACACCTCGAC TCTAGTGACC TGTACAGTTT 7TGCCCTGCT 

+2MRIV GPR TORN MWS GTF PINA YTT GPC 
4401 TGAGGATCGT CGGTCCTAGG ACCTGCAGGA ACATGTGGAG TGGGACCTTC CCCATTAATG CCTACACCAC GGGCCCCTGT 
ACTCCTAGCA GCCAGGATCC TGGACGTCCT TGTACACCTC ACCCTGGAAG GGGTAATTAC GGATGTGGTG CCCGGGGACA 

+ 2 TPLP APN YTT ALWR VSA EEY VEIR QVG 
4 481 ACCCCCCTTC CTGCGCCGAA CTACACGTTC GCGCTATGGA GGGTGTCTGC AGAGGAATAC GTGGAGATAA GGCAGG7GGG 
TGGGGGGAAG GACGCGGCTT GATGTGCAAG CGCGATACCT CCCACAGACG TCTCCTTATG CACCTCTATT CCGTCCACCC 

*2 DFH YVTG MTT DNL KCPC QVP SPE FFT 
4561 GGACTTCCAC TACGTGACGG GTATGACTAC TGACAATCTT AAATGCCCGT GCCAGGTCCC ATCGCCCGAA TTTTTCACAG 
CCTGAAGGTG ATGCACTGCC CATACTGATG ACTGTTAGAA TTTACGGGCA CGGTCCAGGG TAGCGGGCTT AAAAAGTGTC 

+ 2 ELDG VRL HRFA PPC KPL LREE VSF RVG 
4641 AATTGGACGG GGTGCGCCTA CATAGGTTTG CGCCCCCCTG CAAGCCCTTG CTGCGGGAGG AGGTATCATT CAGAGTAGGA 
TTAACCTGCC CCACGCGGAT GTATCCAAAC GCGGGGGGAC GTXCGGGAAC GACGCCCTCC TCCATAGTAA GTCTCATCCT 

+2LHEY PVG SQL PCEP EPD VAV LTSM LTD 
4721 CTCCACGAAT ACCCGGTAGG GTCGCAATTA CCTTGCGAGC CCGAACCGGA CGTGGCCGTG TTGACGTCCA TGCTCACTGA 
GAGGTGCTTA TGGGCCATCC CAGCGTTAAT GGAACGCTCG GGCTTGGCCT GCACCGGCAC AACTGCAGGT ACGAGTGACT 

+2 PSH ITAE AAG RRL ARGS PPS VAS SSA 
4801 TCCCTCCCAT ATAACAGCAG AGGCGGCCGG GCGAAGGTTG GCGAGGGGAT CACCCCCCTC TGTGGCCAGC TCCTCGGCTA 
AGGGAGGGTA TATTGTCGrC TCCGCCGGCC CGCTTCCAAC CGCTCCCCTA GTGGGGGGAG ACACCGGTCG AGGAGCCGAT 
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^ T C T A N H C 

- - - -AA CTTGCACCGC TAACCATGA\- &^vv-\.ii4ArLi t*njAL»Lrt:AT arArrc^aRr^ 

CGGTCGATAG GCGAGGTAGA GAGTTCCGTr GAACGTGGCG ATTGGTRCTG AGGGGACTAC GActS^mI TCTCC^^WG 



♦2SQLS APS IKAT CTA NHD SPDA Prr 

4881 GCCAGCTATC CGCTCCATCT CTCAAGGCAA CTTGCACCGC TAACCATGAC TCCCCTGATG CTGAGCTCAT AGAGGCCAAr 
CGGTrnATirt nrnhnnrhfi^ nnfiTrrrriT'r r.^^.r/rtr-r-r^r^ ^^Z.zZtz.-tZ AGAGGCCAAC 



+ 2LLWR -QEM GGN I 



4961 CTCCTATGGa"gGCAGGAGAT CCGCGGCAAC ATCACCAGGG^TTGAGTCAGA AAACAAAGTG GTGATTCTrr^nrT^r^T^^? 
GAGGATACCT CCGTCCTCTA CCCGCCGTTG TAGTGGTCCC AACTCAGTCT ?^^CAC cSc Icg TgSg^ 

*2 PLV AEED ERE ISV PAEI LRK SRii tth^ 
I^^S^II^IH <^^^AGGAGG ACGAGCGGGA GATCTCCGTA CCCGCAGAAA TCCTGCGGAA GTCTCGGAGA T^CGCCCAGG 
AGGCGAACAC cgcctcctcx TGCTCGCCCT CTAGAGGCAT GGGCGTCTTT AGGACGCCTT CAGAGCCTCT ^^GG^TCC 



+ 2ALPV WAR POYN PPL VET WKKP DYE PPw 
5121 CCCTGCCCGT TTGGGCGCGG CCGGACTATA ACCCCCCGCT AGTGGAGACG TGGAAAAAGC CCGACTACGA ACCACrTr,Tr, 

gggacgggca aacccgcgcc ggcctgatat tggggggcga tcacctctgc acctttttcg ggctgatgct iggtggIcac 

+2VHGC PLP PPK SPPV ppp RKK RTVV ttp 
5201 GTCCATGGCT GCCCGCTTCC ACCTCCAAAG TCCCCTCCTG TGCCTCCGCC TCGGAAGAAG CGGACGGTGG TCCTCACTGA 
CAGGTACCGA CGGGCGAAGG TGG AGGTTTC AGGGGAGGAC ACGGAGGCGG AGCCTTCTTC GCCTGCCaS AGGAGTGACT 

^2 STL STAL AEL ATR SFGS SST SGI TGn 
5281 ATCAACCCTA TCTACTGCCT TGGCCGAGCT CGCCACCAGA AGCTTTGGCA GCTCCTCAAC TTCCGGCATT ACGGGCGACA 
TAGTTGGGAT AGATGACGGA ACCGGCTCGA GCGGTGGTCT TCGAAACCGT CGAGGAGTTG AAGGCCGTAA TGCCC^GT 



+ 2 NTTT SSE PAPS GCP PDS DAES YSS MPP 
5361 ATACGACAAC ATCCTCTGAG CCCGCCCCTT CTGGCTGCCC CCCCGACTCC GACGCTGAGT CCTATTCCTC ChTGCCCCCC 
TATGCTGTTG TAGGAGACTC GGGCGGGGAA GACCG ACGGG GGGGCTGAGG CTGCGACTCA GGATAAGGAG GTACGGGGGG 

^ ^ 0 Z PGD PDL SOGS WST VSS EANA EDV 
BamHI 



5441 CTGGhOGGQG ^GCCTOQQGh TCCGGATCTT AGCGACGGGT CATGGTCAAC GGTCAGTAGT GAGGCCAACG CGGAGGATGT 
GACCTCCCCC TCGGACCCCT AGGCCTAGAA TCGCTGCCCA GTACCAGTTG CCAGTCATCA CTCCGGTTGC GCCTCC7ACA 

+ 2 VCC SMSy SWT GAL VTPC AA E EQK LPI 
5521 CGTGTGCTGC TCAATGTCTT ACTCTTGGAC AGGCGCACTC GTCACCCCGT GCGCCGCGGA AGAACAGAAA CTGCCCATCA 
GCACACGACG AGTTACAGAA TGAGAACCTG TCCGCGTGAG CAGTGGGGCA CGCGGCGCCT TCTTGTCTTT GACGGGTAGT 

*2 NALS NSL LRHH NLV YST TSRS ACQ RQK 
5601 ATGCflCTAAG CAACTCGTTG CTACGTCACC ACAATTTGGT GTATTCCACC ACCTCACGCA GTGCTTGCCA AAGGCAGAPG 
TACGTGATTC GTTGAGCAAC GATGCAGTGG TGTTAAACCA CATAAGGTGG TGGAGTGCGT CACGAACGGT TTCCGTCTT'* 



+ 2KVTF ORL QVL DSHY QDV LKE VKAA ASK 
1 AAAGTQACAT TTGACAGACT GCAAGTTCTG GACAGCCATT ACCAGGACGT ACTCAAGGAG GTTAAAGCAG CGGCGTCAAA 
TTTCAGTGTA AACTGTCTGA CGTTCAAGAC CTGTCGGTAA TGGTCCTGCA TGAGTTCCTC CAATTTCGTC GCCGCAGTTT 



+ 2 VKA NLLS VEE ACS LTPP HSA KSK FGY 
57 61 AGTGAAGGCT AACTTGCTAT CCGTAGAGGA AGCTTGCAGC CTGACGCCCC CACACTCAGC CAAATCCAAG TTTGGTTATG 
TCACTTCCGA TTGAACGATA GGCATCTCCT TCGAACGTCG GACTGCGGGG GTGTGAGTCG GTTTAGGTTC AAACCAATAC 



+ 2GAKD VRC HARK AVT HIN SVWK DLL EON 
5841 GGGCAAAAGA CGTCCGTTGC CATGCCAGAA AGGCCGTAAC CCACATCAAC TCCGTGTGGA AAGACCTTCT GGAAGACAAT 
CCCGTTTTCT GCAGGCAACG GTACGGTCTT TCCGGCATTG GGTGTAGTTG AGGCACACCT TTCTGGAAGA CCTTCTGTTA 



+2VTPI DTT IMA KNEV FCV QPE KGGR KPA 
5921 GTAACACCAA TAGACACTAC CATCATGGCT AA6AACGAGG TTTTCTGCGT TCAGCCTGAG hPiOQOQGGTC GTAAGCCAGC 
CATTGTGGTT ATCTGTGATG GTAGTACCGA TTCTTGCTCC AAAAGACGCA AGTCGGACTC TTCCCCCCAG CATTCGGtCG 
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+2 RLI VFPD LGV RVC EKMA LYD VVt KLP 
6001 TCGTCTCATC GTGTTCCCCG ATCTGGGCGT GCOCGTGTGC GAAAAGATGG CTrTGTACGA CGTGGTTACA AAGCTCCCCT 
AGCAGAGTAG CACAAGGGGC TAGACCCGCA CGCGCACACG CTTTTCTACC GAAACATGCT GCACCAATGT TTCGAGGGGA 

+2LAVM.GSS YGFQ YSP GQR VEPL VQA WKS 

EcoRI 



6081 TGGCCGTGAT GGGAAGCTCC TACGGATTCC AATACTCACC AGGACAGCGG GTTGAATTCC TCGTGCAAGC GTGGAAGTCC 
ACCGGCACTA CCCTTCGAGG ATGCCTAAGG TTATGAGTGG TCCTGTCGCC CAACTTAAGG AGCACGTTCG CACCTTCAGG 



^-2KKTP MGF SYD TRCf DST VTE SDIR TEE 
6161 AAGAAAACCC CAATGGGGTT CTCGTATGAT ACCCGCTGCT TTGACTCCAC AGTCACTGAG AGCGACATCC GTACGGAGGA 
TTCTTTTGGG GTTACCCCAA GAGCATACTA TGGGCGACGA AAC7GAGGTG TCAGTGACTC TCGCTGTAGG CATGCCTCCT 



^2 AIY QCCD LOP QAR VAIK SLT ERL YVG 
6241 GGCAATCTAC CAATGTTGTG ACCTCGACCC CCAAGCCCGC GTGGCCATCA AGTCCCTCAC CGAGAGGCTT TATGTTGGGG 
CCGTTAGATG GTTACAACAC TGGAGCTGGG GGTTCGGGCG CACCGGTAGT TCAGGGAGTG GCTCTCCGAA ATACAACCCC 

^2GPLT NSR GE NC GYR RCR >SGV LTT S CG 
6321 GCCCTCTTAC CAATTCAAGG GGGGAGAACT GCGGCTATCG CAGGTGCCGC GCGAGCGGCG TACTGACAAC TAGCTGTGGT 
CGGGAGAATG GTTAAGTTCC CCCCTCTTGA CGCCGATAGC GTCCACGGCG CGCTCGCCGC ATGACTGTTG ATCGACACCA 



+2NTLT CYI KAR AACR AAG LQD CTML VCG 
6401 AACACCCTCA CTTGCTACAT CAAGGCCCGG GCAGCCTGTC GAGCCGCAGG GCTCCAGGAC TGCACCATCC TCCTGTGTGG 
TTGTGGGAGT GAACGATGTA GTTCCGGGCC CGTCGGACAG CTCGGCGTCC CGAGGTCCTG ACGTGGTACG AGCACACACC 



+2 DDL VVIC ESA GVQ BOAA SLR AFT EAM 
6481 CGACGACTTA GTCGTTATCT GTGAAAGCGC GGGGGTCCAG GAGGACGCGG CGAGCCTGAG AGCCTTCACG GAGGCTATGA 
GCTGCTGAAT CAGCAATAGA CACTTTCGCG CCCCCAGGTC CTCCTGCGCC GCTCGGACTC TCGGAAGTGC CTCCGATACT 



^2 7 R Y S APP GDPP QPE YDL ELIT SCS SNV 
6561 CCAGGTACTC CGCCCCCCCT GGGGACCCCC CACAACCAGA ATACGACTTG GAGCTCATAA CATCATGCTC CTCCAACGTG 
GGTCCATGAG GCGGGGGGGA CCCCTGGGGG GTGTTGGTCT TATGCTGAAC CTCGAGTATT GTAGTACGAG GAGGTTGCAC 



+2SVAH DGA GKR VYYL TRD PTT PLAR AAW 
6641 TCAGTCGCCC ACGACGGCGC TGGAAAGAGG GTCTACTACC TCACCCGTGA CCCTACAACC CCCCTCGCGA GAGCTGCGTG 
AGTCAGCGGG TGCTGCCGCG ACCTTTCTCC CAGATGATGG AGTGGGCACT GGGATGTTGG GGGGAGCGCT CTCGACGCAC 



4-2 ETA RHTP VNS WLG NIIM FAP TLW ARM 
6721 GGAGACAGCA AGACACACTC CAGTCAATTC CTGGCTAGGC AACATAATCA TGTTTGCCCC CACACTGTGG GCGAGGATGA 
CCTCTGTCGT TCTGTGTGAG GTCAGTTAAG GACCGATCCG TTGTATTAGT ACAAACGGGG GTGTGACACC CGCTCCTACT 



+2ILMT HFF SVLt, ARD QLE QALD CEI YGA 
6801 TACTGATGAC CCATTTCTTT AGCGTCCTTA TAGCCAGGGA CCAGCTTGAA CAGGCCCTCG ATTGCGAGAT CTACGGGGCC 
ATGACTACTG GGTAAAGAAA TCGCAGGAAT ATCGGTCCCT GGTCGAACTT GTCCGGGAGC TAACGCTCTA GATGCCCCGG 



+2CYSI EPL OLP P IIQ RLH GLS AFSL HSY 
6891 TGCTACTCCA TAGAACCACT GGATCTACCT CCAATCATTC AAAGACTCCA TGGCCTCAGC GCATTTTCAC TCCACAGTTA 
ACGATGACGT ATCTTGGTGA CCTAGATGGA GGTTAGTAAG TTTCTGAGGT ACCGGAGTCG CGTAAAAGTG AGGTGTCAAT 



+ 2 SPG EINR VAA CLR KLGV PPL RAW RHR 
6961 CTCTCCAGGT GAAATCAATA GGGTGGCCGC ATGCCTCAGA AAACTTGGGG TACCGCCCTT GCGAGCTTGG AGACACCGGG 
GAGAGGTCCA CTTTAGTTAT CCCACCGGCG TACGGAGTCT TTTGAACCCC ATGGCGGGAA CGCTCGAACC TCTGTGGCCC 



+2ARSV RAB LLAR GGR AAI CGKY LFN WAV 
7 041 CCCGGAGCGT CCGCGCTAGC CTTCTGGCCA GAGGAGGCAG GGCTGCCATA TGTGGCAAGT ACCTCTTCAA CTGGGCAGTA 
GGGCCTCGCA GGCGCGATCC GAAGACCGGT CTCCTCCGTC CCGACGGTAT ACACCGTTCA TGGAGAAGTT GACCCGTCAT 
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+2RTKL KLT PIA AAGQ LDL SGW*FTAG YSG 
7121 AGAACAAAGC TCAAACTCAC TCCAATAGCG GCCGCTGGCC AGCTGGACTT GTCCGGCTGG TTCACGGCTG GCTACAGCGG 
TCTTCTTTCG AGTTTGAGTG AGGTTATCGC CGGCGACCGG TCGACCTGAA CAGGCCGACC AAGTGCCGAC CGATGTCGCC 



+2 GDI Y HSV SHA RPR «IMF CLL LLA AGV 
7201 GGGAGACATT TATCACAGCG TGTCTCATGC CCGGCCCCGC TGGATCTGGT TTTGCCTACT CCTGCTTGCT GCAGGGGTAG 
CCCTCTGTAA ATAGTGTCGC ACAGAGTACG GGCCGGGGCG ACCTAGACCA AAACGGATGA GGACGAACGA CGTCCCCATC 



+ 2GrYL LPN R 
7281 GCATCTACCT CCTCCCCAAC CGATGAAGGT TGGGGTAAAC ACTCCGGCCT AAAAAAAAAA AAAAATCTAG AAAGGCGCGC 
CGTAGATGGA GGAGGGGTTG GCTACTTCCA ACCCCATTTG TGAGGCCGGA ttTTTTTTTT TTTTTAGATC TTTCCGCGCG 



BamHI Mlul 



7 361 CAAGATATCA AGGATCCACT ACGCGTTAGA GCTCGCTGAT CAGCCTCGAC TGTGCCTTCT AGTTGCCAGC CATCTGTTGT 
GTTCTATAGT TCCTAGGTGA TGCGCAATCT CGAGCGACTA GTCGGAGCTG ACACGGAAGA TCAACGGTCG GTAGACAACA 



7411 TTGCCCCTCC CCCGTGCCTT CCTTGACCCT GGAA6GTGCC ACTCCCACTG TCCTTTCCTA ATAAAATGAG GAAATTGCAT 
AACGGGGAGG GGGCACGGAA GGAACTGGGA CCTTCCACGG TGAGGGTGAC AGGAAAGGAT TATTTTACTC CTTTAACGTA 



7521 CGCATTGTCr GAGTAGGTCr CATT^YATTC TGGGGGG^GG GGTGGGGCAG GACAGCAAGG GGGAGGATTG GGAAGACAAT 
GCGTAACAGA CTCATCCACA GTAAGATAAG ACCCCCCACC CCACCCCGTC CTGTCGTTCC CCCTCCTAAC CCTTCTGTTA 



7 601 AGCAGGCATG CTGGGGAGCT CTTCCGCTTC CTCGCTCACT GACTCGCTGC GCTCGGTCGT TCGGCTGCGG CGAGCGGTAT 
TCGTCCGTAC GACCCCTCGA GAAGGCGAAG GAGCGAGXGA CTGAGCGACG CGAGCCAGCA AGCCGACGCC GCTCGCCATA 



7 681 CAGCTCACTC AAAGGCGGTA ATACGGTTAT CCACAGAATC AGGGGATAAC GCAGGAAAGA ACATGTGAGC AAAAGGCCAG 
GTCGAGTGAG TTTCCGCCAT TATGCCAATA GGTGTCTTAG TCCCCTATTG CGTCCTTTCT TGTACACTCG TTTTCCGGTC 



77 61 CAAAAGGCCA GGAACCGTAA AAAGGCCGCG TTGCTGGCGT TTTTCCATAG GCTCCGCCCC CCTGACGAGC ATCACAAAAA 
GTTTTCCGGT CCTTGGCATT TTTCCGGCGC AACGACCGCA AAAAGGTATC CGAGGCGGGG GGACTGCTCG TAGTGTTTTT 



7 841 TCGACGCTCA AGTCAGAGGT GGCGAAACCC GACAGGACTA TAAAGATACC AGGCGTTTCC CCCTGGAAGC TCCCTCGTGC 
AGCTGCGAGT TCAGTCTCCA CCGCTTTGGG' CTGTCCTGAT ATTTCTATGG TCCGCAAAGG GGGACCTTCG AGGGAGCACG 



7921 gctctcctgt tccgaccctg ccgcttaccg gatacctgtc cgcctttctc ccttcgggaa gcgtggcgct ttctcaatgc 
cgagaggaca aggctgggac ggcgaatggc ctatggacag gcggaaagag ggaagccctt cgcaccgcga aagagttacg 



8001 TCACGCTGTA GGTATCTCAG TTCGGTGTAG GTCGTTCGCT CCAAGCtGGG CTGTGTGCAC GAACCCCCCG TTCAGCCCGA 
AGTGCGACAT CCATAGAGTC AAGCCACATC CAGCAAGCGA GGTTCGACCC GACACACGTG CTTGGGGGGC AAGTCGGGCT 



8081 CCGCTGCGCC TTATCCGGTA ACTATCGTCT TGAGTCCAAC CCGGTAAGAC ACGACTTATC GCCACTGGCA GCAGCCACTG 
GGCGkCGCGG AATAGGCCAT TGATAGCAGA ACTCAGGTTG GGCCATTCTG TGCTGAATAG CGGTGACCGT CGTCGGTGAC 



8161 GTAACAGGAT TAGCAGAGCG AGGTATGTAG GCGGTGCTAC AGAGTTCTTG AAGTGGTGGC CTAACTACGG CTACACTAGA 
CATTGTCCTA ATCGTCTCGC TCCATACATC CGCCACGATG TCTCAAGAAC TTCACCACCG GATTGATGCC GATGTGATCT 



8241 AGGACAGTAT TTGGTATCTG CGCTCTGCTG AAGCCAGTTA CCTTCGGAAA AAGAGTTGGT AGCTCTTGAT CCGGCAAACA 
TCCTGTCATA AACCATAGAC GCGAGACGAC TTCGGTCAAT GGAAGCCTTT TTCTCAACCA TCGAGAACTA GGCCGTTTGT 



8321 AACCACCGCT GGTAGCGGTG GTTTTTTTGT TTGCAAGCAG CAGATTACGC GCAGAAAAAA AGGATCTCAA GAAGATCCTT 
TTGGTGGCGA CCATCGCCAC CAAAAAAACA AACGTTCGTC GTCTAATGCG CGTCTTTTTT TCCTAGAGTT CTTCTAGGAA 



8401 TGATCTTTTC TACGGGGTCT GACGCTCAGT GGAACGAAAA CTCACGTTAA GGGATTTTGG TCATGAGATT ATCAAAAAGG 
ACTAGAAAAG ATGCCCCAGA CTGCCAGTCA CCTTGCTTTT GAGTGCAATT CCCTAAAACC AGTACTCTAA TAGTTTTTCC 
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648X 


ATCTICACCT AGATCCTTTT AAATTAAAAA TGAAGTTTTA AATCAATCTA AAGTATATAT GAGTAAACTT GGTCTGACAG 
TAGAAGTGGA TCTAGGAAAA TTTAATTTTT ACTTCAAAAT TTAGTTAGAT TTCATATATA CTCATTTGAA CCAGACTGTC 


8561 


TTACCAATGC TTAATCAGTG AGGCACCTAT CTCAGCGATC TGTCTATTTC GTTCATCCAT AGTTGCCTGA CTCCCCGTCG 
AATGGTTACG AATTAGTCAC TCCGTGGATA GAGTCGCTAG ACAGATAAAG CAAGTAGGTA TCAACGGACT CAGGGGCAGC 


8641 


TGTAGATAAC TACGATACGG GAGGGCTTAC CATCTGGCCC 
ACAICTATTG ATGCTATGCC CTCCCGAATG GTAGACCGGG 


CAGTGCTGCA 
GTCACGACGT 


ATGATACCGC GAGACCCACG CTCACCGGCT 
TACTATGGCG CTCTGGGTGC GAGTGGCCGA 


8721 


CCAGATTTAT CAGCAATAAA CCAGCCAGCC GGAAGGGCCG AGCGCAGAAG 
GGTCTAAATA GTCGTTATTT GGTCGGTCGG CCTTCCCGGC TCGCGTCTTC 


TGGTCCTGCA ACTTTATCCG CCTCCATCCA 
ACCAGGACGT TGAAATAGGC GGAGGTAGGT 


8801 


GTCTATTAAT TGTTGCCGGG AAGCTAGAGT AAGTAGTTCG 
CAGATAATTA ACAACGGCCC TTCGATCTCA TTCATCAAGC 


CCAGTTAATA 
GGTCAATTAT 


GTTTGCGCAA 
CAAACGCGTT 


CGTTGTTGCC ATTGCTACAG 
GCAACAACGG TAACGATGTC 


8681 


GCATCGTGGT GTCACGCTCG TCGTTTGGTA TGGCTTCATT 
CGTAGCACCA CAGTGCGAGC AGCAAACCAT ACCGAAGTAA 


CAGCTCCGGT 
GTCGAGGCCA 


TCCCAACGAT 
AGGGTTGCTA 


CAAGGCGAGT TACATGATCC 
GTTCCGCTCA ATGTACTAGG 


8961 


CCCATGTTGT GCAAAAAAGC GGTTAGCTCC 
GGGTACAACA CGTTTTTTCG CCAATCGAGG 


TTCGGTCCTC 
AAGCCAGGAG 


CGATCGTTGT 
GCTAGCAACA 


CAGAAGTAAG 
GTCTTCATTC 


TTGGCCGCAG TGTTATCACT 
AACCGGCGTC ACAATAGTGA 


9041 


CATGGTTATG GCAGCACTGC ATAATTCTCT TACTGTCATG 
GTACCAATAC CGTCGTGACG TATTAAGAGA ATGACAGTAC 


CCATCCGTAA 
GGTAGGCATT 


GATGCTTTTC 
CTACGAAAAG 


TGTGACTGGT GAGTACtCAA 
ACACTGACCA CTCATGAGTT 


9121 


CCAAGTCATT CTGAGAATAG TGTATGCGGC 
CGTTCAGTAA GACTCTTATC ACATACGCCG 


GACCGAGTTG 
CTGGCTCAAC 


CTCTTGCCCG 
GAGAACGGGC 


QCGTCAATAC 
CGCAGTTATG 


GGGATAATAC CGCGCCACAT 
CCCTATTATG GCGCGGTGTA 


9201 


AGCAGAACTT TAAAAGTGCT CATCATTGGA 
TCGTCTTGAA ATTTTCACGA GTAGTAACCT 


AAACGTTCTT 
TTTGCAAGAA 


CGGGGCGAAA 
GCCCCGCTTT 


ACTCTCAAGG ATCTTACCGC TGTTGAGATC 
TGAGAGTTCC TAGAATGGCG ACAACTCTAG 


9281 


CAGTTCGATG TAACCCACTC GTGCACCCAA 
GTCAAGCTAC ATTGGGTGAG CACGTGGGTT 


CTGATCTTCA* GCATCTTTTA 
GACTAGAAGT CGTAGAAAAT 


CTTTCACCAG 
GAAAGTGGTC 


CGTTTCTGGG TGAGCAAAAA 
GCAAAGACCC ACTCGTTTTT 


9361 


CAGGAAGGCA AAATGCCGCA AAAAAGGGAA 
GTCCTTCCGT TTTACGGCGT TTTTTCCCTT 


TAAGGGCGAC 
ATTCCCGCTG 


AGGGAAATGT 
TGCCTTTACA 


TGAATACTCA 
ACTTATGAGT 


TACTCTTCCT TTTTCAATAT 
ATGAGAAGGA AAAAGTTATA 


9441 


TATTGAAGCA TTTATCAGGG TTATTGTCTC 
ATAACTTCGT AAATAGTCCC AAXAACAGAG 


ATGAGCGGAT 
TACTCGCCTA 


ACATATTTGA 
TGTATAAACT 


ATGTATTTAG 
TACATAAATC 


AAAAATAAAC AAATAGGGGT 
TTTTTATTTG TTTATCCCCA 


9521 


TCCGCGCACA TTTCCCCGAA AAGTGCCACC 
AGGCGCGTGT AAAGGGGCTT TTCACGGTGG 


TGACGTCTAA 
ACTGCAGATT 


GAAACCATTA 
CTTTGGTAAX 


TTATCATGAC 
AATAGTACTG 


ATTAACCTAT AAAAATAGGC 
TAATTGGATA TTTTTATCCG 


9601 


GTATCACGAG GCCCTTTCGT C 
CATAGTGCTC CGGGAAAGCA G 
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1 


TCGCGCGTTT CGGTGATGAC GGTGAAAACC TCTGACACAT GCAGCTCCCG GAGACGGTCA CAGCTTGTCT GTAAGCGGAT 
AGCGCGCAAA GCCACTACTG CCACTTTTGG AGACTGTGTA CGTCGAGGGC CTCTGCCAGT GTCGAACAGA CATTCGCCTA 


81 


GCCGGGAGCA GACAAGCCCC TCAGGGCGCG TCAGCGGGTG TTGGCGGGTG TCGGGGCTGG CTTAACTATG CGGCATCAGA 
CGGCCCTCGT CTGTTCGCCC AGTCCCGCGC AGTCGCCCAC AACCGCCCAC AGCCCCGACC GAATTGATAC GCCGTAGTCT 


161 


GCAGATTGTA CTGAGAGTGC ACCATATGAA GCTTTTTGCA AAAGCCTAGG CCTCCAAAAA AGCCTCCTCA CTACTTCTGG 
CGTCTAACAT GACTCTCACG TGGTATACTT CGAAAAACGT TTTCGGATCC GGAGGTTTTT TCGGAGGAGT GATGAAGACC 


241 


AATAGCTCAG AGGCCGAGGC GGCCTCGGCC TCTGCATAAA TAAAAAAAAT TAGTCAGCCA TGGGGCGGAG AATGGGCGGA 
TTATCGAGTC TCCGGCTCCG CCGGAGCCGG AGACGTATTT ATTTTTTTTA ATCAGTCGGT ACCCCGCCTC TTACCCGCCT 


321 


ACTGGGCGGG QAGGGAATTA TTGGCTATTG GCCATTGCAT ACGTTGTATC TATATCATAA TATGTACATT TATATTGGCT 
TGACCCGCCC CTCCCTTAAT AACCGATAAC CGGTAACGTA TGCAACATAG ATATAGTATT ATACATGTAA ATATAACCGA 


401 


CATGTCCAAT ATGACCGCCA TGTTGACATT GATTATTGAC TAGTTATTAA TAGTAATCAA TTACGGGGTC ATTAGTTCAT 
GTACAGGTTA TACTGGCGGT ACAACTGTAA CTAATAACTG ATCAATAATT ATCATTAGTT AATGCCCCAG TAATCAAGTA 


4B1 


AGCCCATATA TGGAGTTCCG CGTTACATAA CTTACGGTAA ATGGCCCGCC TGQCTGACCG CCCAACGACC CCCGCCCATT 
TCGGGTATAT ACCTCAAGGC GCAATGTATT GAATGCCATT TACCGGGCGG ACCGACTGGC GGGTTGCTGG GGGCGGGTAA 


561 


GACGTCAATA ATGACGTATG TTCCCATAGT AACGCXAATA GGGACTTTCC ATTGACGTCA ATGGGTGGAG TATTTACGGT 
CTGCAGTTAT TACTGCATAC AAGGGTATCA TTGCGGTTAT CCCTGAAAGG TAACTGCAGT TACCXACCTC ATAAATGCCA 


641 


AAACTGCCCA CTTGGCAGTA CATCAAGTGT ATCATATGCC AAGTCCGCCC CCTATTGACG TCAATGACGG TAAATGGCCC 
TTIGACGGGT GAACCGTCAT GTAGTTCACA TAGTATACGG TTCAGGCGGG GGATAACTGC AGTTACTGCC ATTTACCGGG 


721 


GCCTGGCATT ATGCCCAGTA CATGACCTTA CGGGACTTTC CTACTTGGCA GTACATCTAC GTATTAGTCA TCGCTATTAC 
CGGACCGTAA TACGGGTCAT GTACTGGAAT GCCCTGAAAG GATGAACCGT CATGTAGATG CATAATCAGT AGCGATAATG 


801 


CATGGTGATG CGGTTTTGGC AGTACACCAA TGGGCGTGGA TAGCGGTTTG ACTCACGGGG ATTTCCAAGT CTCCACCCCA 
GTACCACTAC GCCAAAACCG TCATCTGCTT ACCCGCACCT ATCGCCAAAC TGAGTGCCCC TAAAGGTTCA GAGGTGGGGT 


881 


TTGACGTCAA TGGGAGTTTG TTTTGGCACC AAAATCAACG GGACTTTCCA AAATGTCGTA ATAACCCCGC CCCGTTGACG 
AACTGCAGTT ACCCTCAAAC AAAACCGTGG TTTTAGTTGC CCTGAAAGGT TTTACAGCAT TATTGGGGCG GGGCAACTGC 


961 


CAAATGGGCG GTAGGCGTGT ACGGTGGGAG GTCTATATAA GCAGAGCTCG TTTAGTGAAC CGTCAGATCG CCTGGAGACG 
GTTTACCCGC CATCCGCACA TGCCACCCTC CAGATATATT CGTCTCGAGC AAATCACTTG GCAGTCTAGC GGACCTCTGC 


1041 


CCATCCACGC TGTTTTGACC TCCATAGAAG ACACCGGGAC CGATCCAGCC TCCGCGGCCG GGAACGGTGC ATTGGAACGC 
GGTAGGTGCG ACAAAACTGG AGGTATCTTC TGTGGCCCTG GCTAGGTCGG AGGCGCCGGC CCTTGCCACG TAACCTTGCG 


1121 


GGATTCCCCG TGCCAAGAGT GACGTAAGTA CCGCCTATAG ACTCTATAGG CACACCCCTT TGGCTCTTAT GCATGCTATA 
CCTAAGGGGC ACGGTTCTCA CTGCATTCAT GGCGGATATC TGAGATATCC GTGTGGGGAA ACCGAGAATA CGTACGATAT 


1201 


CTGTTTTTGG CTTGGGGCCT ATACACCCCC GCTCCTTATG CTATAGGTGA TGGTATAGCT TAGCCTATAG GTGTGGGTTA 
GACAAAAACC GAACCCCGGA TATCTGGGGG CGAGGAATAC GATATCCACT ACCATATCGA ATCGGATATC CACACCCAAT 


1281 


TTGACCATTA TTGACCACTC CCCTATTGGT GACGATACTT TCCATTACTA ATCCATAACA TGGCTCrTTG CCACAACTAT 
AACTGGTAAT AACTGGTGAG GGGATAACCA CTGCTATGAA AGGTAATGAT TAGGTATtGT ACCGAGAAAC GGTGTTGATA 


1361 


CTCTATTGGC TATATGCCAA TACTCTGTCC TTCAGAGACT GACACGGAd CTGTATTTTT ACAGGATGGG GTCCATTTAT 
GAGATAACCG ATATACGGTT ATGAGACAGG AAGTCTCTGA CTGTGCCTGA GACATAAAAA TGTCCTACCC CAGGTAAATA 


1441 


TATTTACAAA TTCACATATA CAACAACGCC GTCCCCCGTG CCCGCAGTTT TTATTAAACA TAGCGTGGGA TCTCCGACAT 
ATAAATGTTT AAGTGTATAX GTTGTTGCGG CAGGGGGCAC GGGCGTCAAA AATAATTTGT ATCGCACCCT AGAGGCTGTA 
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1521 CTCGGGTACG TGTTCCGGAC ATGGGCTCTT CTCCGGTAGC GGCGGAGCTT CCACATCCGA GCCCTGGTCC CATCCGTCCA 
GAGCCCATGC ACAAGGCCTG TACCCGAGAA GAGGCCATCG CCGCCTCGAA GGTGTAGGCT CGGGACCAGG GTAGGCAGGT 



1601 GCGGCTCATG GTCGCTCGGC AGCTCCTTGC TCCTAACAGT GGAGGCCAGA CTTAGGCACA GCACAATQCC CACCACCACC 
CGCCGAGTAC CAGCGAGCCG TCGAGGAACG AGGATTGTCA CCTCCGGTCT GAATCCGTGT CGTGTTACGG GTGGTGGTGG 



1681 AGTGTGCCGC ACAAGGCCGT GGCGGTAGGG TATGTGTCTG AAAATGAGCT CGGAGATTGG GCTCGCACCT GGACGCAGAT 
TCACACGGCG TGTTCCGGCA CCGCCATCCC ATACACAGAC TTTTACTCGA GCCTCTAACC CGAGCGTGGA CCTGCGTCTA 



1761 GGAAGACTTA AGGCAGCGGC AGAAGAAGAT GCAGGCAGCT GAGTTGTTGT ATTCTGATAA GAGTCAGAGG TAACTCCCGT 
CCTTCTGAAT TCCGTCGCCG TCTTCTTCTA CGTCCGTCGA CTCAACAACA TAAGACTATT CTCAGTCTCC ATTGAGGGCA 



1341 TGCGGTGCTG TTAACGGTGG AGGGCAGTGT AGTCTGAGCA GTACTCGTTG CTGCCGCGCG CGCCACCAGA CATAATAGCT 
ACGCCACGAC AATTGCCACC TCCCGTCACA TCAGACTCGT CATGAGCAAC GACGGCGCGC GCGGTGGTCT GTATTATCGA 



EcoRI 



1921 GACAGACTAA CAGACTGTTC CTTTCCATGG GTCTTTTCTG CAGTCACCGT CGTCGACCTA AGAATTCAGA CTCGAGCAAG 
CTGrCTGATT GTCTGACAAG GAAAGGTACC CAGAAAAGAC GTCAGTGGCA GCAGCTGGAT TCTTAAGTCT GAGCTCGTTC 



Xbal BamHI Mlul 



2001 TCTAGAAAGG CGCGCCAAGA TATCAAGGAT CCACTACGCG TTAGAGCTCX5 CTGATCAGCC TCGACTGTGC CTTCTAGTTG 
AGATCTTTCC GCGCGGTTCT ATAGTTCCTA GGTGATGCGC AATCTCGAGC GACTAGTCGG AGCTGACACG GAAGATCAAC 



2081 CCAGCCATCT GTTGTTTGCC CCTCCCCCGT GCCTTCCTTG ACCCTGGAAG GTGCCACTCC CACTGTCCTT TCCtAATAAA 
GGTCGGTAGA CAACAAACGG GGAGGGGGCA CGGAAGGAAC TGGGACCTTC CACGGTGAGG GTGACAGGAA AGGATTATTT 



2161 ATGAGGAAAT TGCATCGCAT TGTCTGAGTA GGTGTCATTC TATTCTGGGG GGTGGGGTGG GGCAGGACAG CAAGGGGGAG 
TACTCCTTTA ACGTAGCGTA ACAGACTCAT CCACAGTAAG ATAAGACCCC CCACCCCACC CCGTCCTGTC GTTCCCCCTC 



2241 GATTGGGAAG ACAATAGCAG GCATGCTGGG GAGCTCTTCC GCTTCCTCGC TCACTGACTC GCTGCGCTCG GTCGTTCGGC 
CTAACCCTTC TGTTATCGTC CGTACGACCC CTCGAGAAGG CGAAGGAGCG ACTGACTGAG CGACGCGAGC CAGCAAGCCG 



2321 TGCGGCGAGC GGTATCAGCT CACTCAAAGG CGGTAATACG GTTATCCACA GAATCAGGGG ATAACGCAGG AAAGAACATG 
ACGCCGCTCG CCATAGTCGA GTGAGTTTCC GCCATTATGC CAATAGGTGT CTrAGTCCCC TATTGCGTCC TTTCTTGTAC 



2401 TGAGCAAAAG GCCAGCAAAA GGCCAGGAAC CGTAAAAAGG CCGCGTTGCT GGCGTTTTTC CATAGGCTCC GCCCCCCTGA 
ACTCGTtTTC CGGTCGTTTT CCGGTCCTTG GCATTTTTCC GGCGCAACGA CCGCAAAAAG GTATCCGAGG CGGGGGGACT 



2481 CGAGCATCAC AAAAATCGAC GCTCAAGTCA GAGGTGGCGA AACCCGACAG GACTATAAAG ATACCAGGCG TTTCCCCCTG 
GCTCGTAGTG TTTTTAGCTG CGAGTTCAGT CTCCACCGCT TTGGGCTGTC CTGATATTTC TATGGTCCGC AAAGGGGGAC 



2561 GAAGCTCCCT CGTGCGCTCT CCTGTTCCGA CCCTGCCGCT TACCGGATAC CTGTCCGCCT TTCTCCCTTC GGGAAGCGTG 
CTTCGAGGGA GCACGCGAGA GGACAAGGCT GGGACGGCGA ATGGCCTATG GACAGGCGGA AAGAGGGAAG CCCTTCGCAC 



2641 GCGCTTTCTC AATGCTCACG CTGTAGGTAT CTCAGTTCGG TGTAGGTCGT TCGCTCCAAG CTGGGCTGTG TGCACGAACC 
CGCGAAAGAG TTACGAGTGC GACATCCATA GAGTCAAGCC ACATCCAGCA AGCGAGGTTC GACCCGACAC ACGTGCTTGG 



2721 CCCC6TTCAG CCCGACCGCT GCGCCTTATC CGGTAACTAT CGTCTTGAGT CCAACCCGCT AAGACACGAC TTATCGCCAC 
GGGGCAAGTC GGGCTGGCGA CGCGGAATAG GCCATTGATA GCAGAACTCA QGTTGGGCCA TTCTGTGCTG AATAGCGGTG 



2801 TGGCAGCAGC CACTGGTAAC AGGATTAGCA GAGCGAGGTA TGTAGGCGGT GCTACAGAGT TCTTGAAGTG GTGGCCTAAC 
ACCGTCGTCG GTGACCATTG TCCTAATCGT CTCGCTCCAT ACATCCGCCA CGATGTCTCA AGAACTTCAC CACCGGATTG 



2881 TACGGCTACA CTAGAAGGAC AGTATTTGGT ATCTGCGCTC TGCTGAAGCC AGTTACCTTC GGAAAAAGAG TTGGTAGCTC 
ATGCCGATGT GATCTTCCtG TCATAAACCA TAGACGCGAG ACGACTTCGG TCAATQGAAG CCTTTTTCTC AACCATCGAG 
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2961 TTGATCCGGC AAACAAACCA CCGCTGGTAG CGGTGGTTTT TTTGTTTGCA AGCAGCAGAT TACGCGCAGA AAAAAAGGAT 



AACTAGGCCG TTTGTTTGGT GGCGACCATC GCCACCAAAA AAACAAACGT TCGTCGTCTA ATGCGCGTCT TTTTTTCCTA 




CTCAAGAAGA TCCTTTGATC TTtTCTACGG GGTCTGACGC TCAGTGGAAC GAAAACTCAC GTTAAGGGAT TTTGGTCATG 
GAGTTCXrCT AGGAAACTAG AAAAGATGCC CCAGACTGCG AGTCACCTTG CTTTTGAGTG CAATTCCCTA AAACCAGTAC 


3121 


AGAtTATCAA AAAGGATCTT CACCTAGATC CTTTTAAATT AAAAATGAAG TTTTAAATCA ATCTAAAGTA TATATGAGTA 
TCTAATAGTT TTTCCTAGAA GTGGATCTAG GAAAATTTAA TTTTTACTTC AAAATTTAGT TAGATTTCAT ATATACTCAT 


3201 


AACTTGGTCT GACAGTTACC AATGCTTAAT CAGTGAGGCA CCTATCTCAG CGATCTGTCT ATTTCGTTCA TCCATAGTTG 
TTGAACCAGA CTGTCAATGG TTACGAATTA GTCACTCCGT GGATAGAGTC GCTAGACAGA TAAAGCAAGT AGCTATCAAC 


3261 


CCTGACTCCC CGTCGTGTAG ATAACTACGA TACGGGAGGG CTTACCATCT GGCCCCAGTG CTGCAATGAT ACCGCGAGAC 
GGACTGAGGG GCAGCACATC TATTGATGCT ATGCCCTCCC GAATGGTAGA CCGGGGTCAC GACGTTACTA TGGCGCTCTG 


3361 


CCACGCTCAC CGGCTCCAGA TTTATCAGCA ATAAACCAGC CAGCCGGAAG GGCCGAGCGC AGAAGTGGTC CTGCAACTTT 
GGTGCGAGTG GCCGAGGTCT AAATAGTCGT TATTTGGTCG CTCGGCCTTC CCGGCTCGCG TCTTCACCAG GACGTTGAAA 


3441 


ATCCGCCTCC ATCCAGTCTA TTAATTGTTG CCGGGAAGCf AGAGTAAGTA GTTCGCCAGT TAATAGTTTG CGCAACGTTG 
TAGGCGGAGG TAGGTCAGAT AATTAACAAC GGCCCTTCGA TCTCATTCAT CAAGCGGTCA ATTATCAAAC GCGTTGCAAC 


3521 


TTGCCATTGC TACAGGCATC GTGGTGTCAC GCTCGTCGTT TGGTATGGCT TCATTCAGCT CCGGTTCCCA ACCATCAAGG 
AflCGGTAACG ATGTCCGTAG CACCACAGTG CGAGCAGCAA ACCATACCGA AGTAAGTCGA GGCCAAGGGT TGCTAGTTCC 


3601 


CGAGTTACAT GATCCCCCAT GTTGTGCAAA AAAGCGGTTA GCTCCTTCGG TCCTCCGATC GTTGTCAGAA GTAAGTTGGC 
GCTCAAXGTA CTAGGGGGTA CAACACGTTT TTTCGCCAAT CGAGGAAGCC AGGAGGCTAG CAACAGTCTT CATTCAACCG 


3681 


CGCAGTGTTA TCACTCATGG TTATGGCAGC ACTGCATAAT TCTCTTACTG TCATGCCATC CGTAAGATGC TTTTCTGTGA 
GCGTCACAAT AGTGAGTACC AATACCGTCG TGACGTATTA AGAGAATGAC AGTACGGTAG GCATTCTACG AAAAGACACT 


3761 


CTGGTGAGTA CTCAACCAAG TCATTCTGAG AATAGTGTAT GCGGCG^CCG AGTTGCTCTT GCCCGGCCTC AATACGGGAT 
GACCACTCAT GAGTTGGTTC AGTAAGACTC TTATCACATA CGCCGCTGGC TCAACGAGAA CGGGCCGCAG TTATGCCCTA 


3841 


AATACCGCGC CACATAGCAG AACTTTAAAA GTGCTCATCA TTGGAAAACG TTCTTCGGGG CGAAAACTCT CAAGGATCTT 
TTATGGCGCG GTGTATCGTC TTGAAATTTT CACGAGTAGT AACCTTTTGC AAGAAGCCCC GCTTTTGAGA GTTCCTAGAA 


3921 


ACCGCTGTTG AGATCCAGTT CGATGTAACC CACTCGTGCA CCCAACTGAT CTTCAGCATC TTTTACTTTC ACCAGCGTTT 
TGGCGACAAC TCTAGGTCAA GCTACATTGG GTGAGCACGT GGGTTGACTA GAAGTCGTAG AAAATGAAAG TGGTCGCAAA 


4001 


CTGGGTGAGC AAAAACAGGA AGGCAAAATG CCGCAAAAAA GGGAATAAGG GCGACACGGA AATGTTGAAT ACTCATACTC 
GACCCACTCG TTTTTGTCCT TCCGTTTTAC GGCGTTTTTT CCCTTATTCC CGCTGTGCCT TTACAACTTA TGAGTATGAG 


4081 


TTCCTTTTTC AATATTATTG AAGCATTTAT CAGGGTTATT GTCTCATGAG CGGATACATA TTTGAATGTA TTTAGAAAAA 
AAGGAAAAAG TTATAATAAC TTCGTAAATA GTCCCAATAA CAGAGTACTC GCCTATGTAT AAACTTACAT AAATCTTTTT 


4161 


TAAACAAATA GGGGTTCCGC GCACATTTCC CCGAAAAGTG CCACCTGACG TCTAAGAAAC CATTATTATC ATGACATTAA 
ATTTGTTTAT CCCCAAGGCG CGTGTAAAGG GGCTTTTCAC GGTGGACTGC AGATTCTTTG GTAATAATAG TACTGTAATT 


4241 


CCTATAAAAA TAGGCGTATC ACGAGGCCCT TTCGTC 
GGATATTTTT ATCCGCATAG TGCTCCGGGA AAGCAG 
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TCGCGCGTTT CGGTGATGAC GGTGAAAACC TCTGACACAT GCAGCTCCCG 
AGCGCGCAAA GCCACTACTG CCACTTTTGG AGACTGTGTA CGTCGAGGGC 



51 


GAGACGGTCA 
CTCTGCCAGT 


CAGCTTGTCT 
GTCGAACAGA 


GTAAGCGGAT 
CATTCGCCTA 


GCCGGGAGCA 
CGGCCCTCGT 


GACAAGCCCG 
CTGTTCGGGC 


101 


TCAGGGCGCG 
AGTCCCGCGC 


TCAGCGGGTG 
AGTCGCCCAC 


TTGGCGGGTG 
AACCGCCCAC 


rCGGGGCTGG 
AGCCCCGACC 


CTTAACTATG 
GAATTGATAC 


151 


CGGCATCAGA 
GCCGTAGTCT 


GCAGATTGTA 
CGTCTAACAT 


CTGAGAGTGC 
GACTCTCACG 


ACCATATGAA 
TGGTATACTT 


CCTTTTTGCA 
CGAAAAACGT 




StuI 










AAAGCCTAGG 


CCTCCAAAAA AGCCTCCTCA CTACTTCTGG AATAGCTCAG 
GGAGGTTTTT TCGGAGGAGT GATGAAGACC TTATCGAGTC 


CD L 


AGGCCGAGGC 


GGCCTCGGCC 


TCTGCATAAA 
AGACGTATTT 


TAAAAAAAAT 
ATTTTTTTTA 


TAGTCAGCCA 
ATCAGTCGGT 




TGGGGCGGAG 


AATGGGCGGA 
TTACCCGCCT 


ACTGGGCGGG 
TGACCCGCCC 


GAGGGAATTA 
CTCCCTTAAT 


TTGGCTATTG 


o 31 


GCCATTGCAT 
CGGTAACGTA 


ACGTTGTATC 
TGCAACATAG 


TATATCATAA 
ATATAGTATT 


TATGTACATT 
ATACATGTAA 


TATATTGGCT 
ATATAACCGA 


dm 


CATGTCCAAT 
GTACAGGTTA 


ATGACCGCCA 
TACTGGCGGT 


TGTTGACATT 
ACAACTGTAA 


GATTATTGAC 
CTAATAACTG 


TAGTTATTAA 
ATCAATAATT 


4 51 


TAGTAATCAA 
ATCATTAGTT 


TTACGGGGTC 
AATGCCCCAG 


ATTAGTTCAT 
TAATCAAGTA 


AGCCCATATA 
TCGGGTATAT 


TGGAGTTCCG 
ACCTCAAGGC 


501 


CGTTACATAA 
GCAATGTATT 


CTTACGGTAA 
GAATGCCATT 


ATGGCCCGCC 
TACCGGGCGG 


TGGCTGACCG 
ACCGACTGGC 


CCCAACGACC 
GGGTTGCTGG 


551 


CCCGCCCATT 
GGGCGGGTAA 


GACGTCAATA 
CTGCAGTTAT 


ATGACGTATG 
TACTGCATAC 


TTCCCATAGT 
AAGGCTATCA 


AACGCCAATA 
TTGCGGTTAT 


601 


GGGACTTTCC 
CCCTGAAAGG 


ATTGACGTCA 
TAACTGCAGT 


ATGGGTGGAG 

TACccAccrrc 


TATTTACGGT 
ATAAATGCCA 


AAACTGCCCA 
TTTGACGGGT 


651 


CTTGGCAGTA 
GAACCGTCAT 


CATCAAGTGT 
GTAGTTCACA 


ATCATATGCC 
TAGTATACGG 


AAGTCCGCCC 
rrCAGGCGGG 


CCTATTGACG 
GGATAACTGC 


701 


TCAATGACGG 
AGTTACTGCC 


TAAATGCCCC 
ATTTACCGGG 


GCCTGGCATT 
CGGACCGTAA 


ATGCCCAGTA 
TAC6GGTCAT 


CATGACCTTA 
GTACTGGAAT 


751 


CGGGACTTTC 
GCCCTGAAAG 


CTACTTGGCA 
GATGAACCGT 


GTACATCTAC 
CATGTAGATG 


GTATTAGTCA 
CAtAATCAGT 


TCGCTATTAC 
AGCGATAATG 


801 


CATGGTGATG 
GTflCCACTAC 


CGGTTTTGGC 
GCCAAAACCG 


AGTACACCAA 
TCATGTGGTT 


TGGGCGTGGA 
ACCCGCACCT 


TAGCGGTTrc 
ATCGCCAAAC 


851 


ACTCACGGGG 
TGAGTGCCCC 


ATTTCCAAGT 
TAAAGGTTCA 


CTCCACCCCA 
GAGGTGGGGT 


TTGACGTCAA 
AACTGCAGTT 


TGGGAGTTTG 
ACCCTCAAAC 
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901 


TTTTGGCACC AAAATCAACG GGACTTTCCA AAATGTCGTA ATAACCCCGC 
AAAACCGTGG TTTTAGTTGC CCTGAAAGGT TTTACAGCAT TATTGGGGCG 




951 


CCCGTTGACG CAAATGGGCG GTAGGCGTGT ACGGTGGGAG GTCTATATAA 
GGGCAACTGC GTTTACCCGC CATCCGCACA TGCCACCCTC CAGATATATT 




1001 


GCAGAGCTCG TTTAGTGAAC CGTCAGATCG CCTGGAGACG CCATCCACGC 
CGTCTCGAGC AAATCACTTG GCAGTCTAGC GGACCTCTGC GGTAGGTGCG 




1051 


TGTTTTGACC TCCATAGAAG 
ACAAAACTGG AGGTATCTTC 


ACACCGGGAC CGATCCAGCC TCCGCGGCCG 
TGTGGCCCTG GCTAGGTCGG AGGCGCCGGC 




1101 


GGAACGGTGC ATTGGAACGC 
CCTTGCCACG TAACCTTGCG 


GGATTCCCCG TGCCAAGAGT GACGTAAGTA 
CCTAAGGGGC ACGGTTCTCA CTGCATTCAT 




1151 


CCGCCTATAG ACTCTATAGG CACACCCCTT TGGCTCTTAT GCATGCTATA 
GGCGGATATC TGA6ATATCC GTGTGGGGAA ACCGAGAATA CGTACGATAT 




1201 


CTGTTTTTGG CTTGGGGCCT 
GACAAAAACC GAACCCCGGA 


ATACACCCCC GCTCCTTATG CTATAGGTGA 
TATGTGGGGG CGAGGAATAC GATATCCACT . 




1251 


TGGTATAGCT TAGCCTATAG GTGTGGGTTA TTGACCATTA TTGACCACTC 
ACCATATCGA ATCGGATATC CACACCCAAT AACTGGTAAT AACTGGTGAG 




1301 


CCCTATTGGT GACGA7ACTT 
GGGATAACCA. CTGCTATGAA 


TCCATTACTA ATCCATAACA TGGCTCTTTG 
AGGTAATGAT TAGGTATTGT ACCGAGAAAC 




1351 


CCACAACTAT CTCTATTGGC 
GGTGTTGATA GAGATAACCG 


TATATGCCAA TACTCTGTCC TTCAGAGACT 
ATATACGGTT ATGAGACAGG AAGTCTCTGA 




1401 


GACACGGACT CTGTATTTTT 
CTGTGCCTGA GACATAAAAA 


ACAGGATGGG GTCCATTTAT TATTTACAAA 
TGTCCTACCC CAGGTAAATA ATAAATGTTT 




1451 


TTCACATATA CAACAACGCC 
AAGTGTATAT GTTGTTGCGG 


GTCCCCCGTG CCCGCAGTTT TTATTAAACA 
CAGGGGGCAC GGGCGTCAAA AATAATTTGT 




1501 


TAGCGTGGGA TCTCCGACAT 
ATCGCACCCT AGAGGCTGTA 


CTCGGGTACG TGTTCCOGAC ATGGGCTCTT 
GAGCCCATGC ACAAGGCCTG TACCCGAGAA 




1551 


CTCCGGTAGC GGCGGAGCTT CCACATCCGA GCCCTGGTCC CATCCGTCCA 
GAGGCCATCG CCGCCTCGAA GGTGTAGGCT CGGGACCAGG GTAGGCAGGT 




1601 


GCGGCTCATG GTCGCTCGGC 
CGCCGAGTAC CAGCGAGCCG 


AGCTCCTTGC TCCTAACAGT GGAGGCCAGA 
TCGAGGAACG AGGATTGTCA CCTCC6GTCT 




1651 


CTTAGGCACA GCACAATGCC 
■GAATCCGTGT CGTGTTACGG 


CACCACCACC AGTGTGCCGC ACAAGGCCGT 
GTGGTGGTGG TCACACGGCG TGTTCCGGCA 




1701 


GGCGGTAGGG TATGTGTCTG 
CCGCCATCCC ATACACAGAC 


AAAATGAGCT CGGAGATTGG GCTCGCACCT 
TTTTACTCGA GCCTCTAACC CGAGCGTGGA 




1751 


GGACGCAGAT GGAAGACTTA 
CCTGCGTCTA CCTTCTGAAT 


AGGCAGCGGC AGAAGAAGAT GCAGGCAGCT 
TCCGTCGCCG TCTTCTTCTA CGTCCGTCGA 




1801 


GAGTTGTTGT ATTCTGATAA 
CTCAACAACA TAAGACTATT 


GAGTCAGAGG TAACTCCCGT TGCGGTGCTG 
CTCAGTCTCC ATTGAGGGCA ACGCCACGAC 
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1851 TTAACGGTGG AGG6CAGTGT AGTCTGAGCA GTACTCGTTG CTGCCGCGCG 
AATTGCCACC TCCCGTCACA TCAGACTCGT CATGAGCAAC GACGGCGCGC 



1901 CGCCACCAGA CATAATAGCT GACAGACTAA CAGACTGTTC CTTTCCATGG 
GCGGTGGTCT GTATTATCGA CTGTCTGATT GTCTGACAAG GAAAGGTACC 



*2 MAP 

EcoRI 

1951 GTCTTTTCTG CAGTCACCGT CGTCGACCTA AGAATTCACC ATGGCGCCCA 
CAGAAAAGAC GTCAGTGGCA GCAGCTGGAT TCTTAAGTGG TACCGCGGGT 



+2irAy AQO TRGL LGC IIT 
2001 TCACGGCGTA CGCCCAGCAG ACAAGGGGCC TCCTAGGGTG CATAATCACC 
AGTGCCGCAT GCGGGTCGTC TGTTCCCCGG AGGATCCCAC GTATTAGTGG 



+2SLTG RDK NQV EGEV QIV 
2051 AGCCTAACTG GCCGGGACAA AAACCAAGTG GAGGGTGAGG TCCAGATTGT 
rCGGATTGAC CGGCCCTGTT TTTGGTTCAC CTCCCACTCC AGGTCTAACA 



+ 2 STA AQTF LAT CIN GVC 
2101 GTCAACTGCT GCCCAAACCT TCCTGGCAAC GTGCATCAAT GGGGTGTGCT 
CAGTTGACGA CGGGTTTGGA AGGACCGTTG CACGTAGTTA CCCCACACGA 



+ 2WTVY HGA GTRT IAS PKG 
2151 GGACTGTCTA CCACGGGGCC GGAACGAGGA CCATCGCGTC ACCCAAGGGT 
CCTGACAGAT GGTGCCCCGG CCTTGCTCCT GGTAGCGCAG TGGGTTCCCA 



-^2PVIQ MYT NVD QDLV GWP 
220 1 CCTGTCATCC AGATGTATAC CAATGTAGAC CAAGACCTTG TGGGCTGGCC 
GGACAGTAGG TCTACATATG GTTACATCTG GTTCTGGAAC ACCCGACCGG 



+2 ASQ GTRS LTP CTC GSS 
2251 CGCTTCGCAA GGTACCCGCT CATTGACACC CTGCACTTGC GGCTCCTCGG 
GCGAAGCGTT CCATGGGCGA GTAACTGTGG GACGTGAACG CCGAGGAGCC 



+2DLyL VTR HADV IPV RRK 
2301 ACCTTTACCT GGTCACGAGG CACGCCGATG TCATTCCCGT GCGCCGGCGG 
TGGAAATGGA CCAGTGCTCC GTGCGGCTAC AGTAAGGGCA CGCGGCCGCC 



+2GDSR GSL LSP RPIS YLK 
2351 GGTGATAGCA GGGGCAGCCT GCTGTCGCCC CGGCCCATTT CCTACTTGAA 
CCACTATCGT CCCCGTCGGA CGACAGCGGG GCCGGGTAAA GGATGAACTT 



+2 GSS GGPL LCP AGH AVG 
2401 AGGCTCCTCG GGGGGTCCGC TGTT6TGCCC CGCGGGGCAC GCCGTGGGCA 
TCCGAGGAGC CCCCCAGGCG ACAACACGGG GCGCCCCGTG CGGCACCCGT 



+ 2IFRA AVC TRGV AKA VDF 
2 4 51 TATTTAGGGC CGCGGTGTGC ACCCGTGGAG TGGCTAAGGC GGTGGACTTT 
ATAAATCCCG GCGCCACACG TGGGCACCTC ACCGATTCCG CCACCTGAAA 



+2rPVE NLE TTM RSPV PTD 
2501 ATCCCTGTGG AGAACCTAGA GACAACCATG AGGTCCCCGG TGTTCACGGA 
TAGGGACACC TCTTGGATCT CTGTTGGTAC TCCAGGGGCC ACAAGTGCCT 
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+2 NSS PPVV PQS FQV AHL 
2551 TAACTCCTCT CCACCAGTAG TGCCCCAGAG CTTCCAGGTG GCTCACCTCC 
ATTGAGGAGA GGTGGTCATC ACGGGGTCTC GAAGGTCCAC CGAGTGGAGG 



+2HAPT GSG KSTK VPA AYA 
2601 ATGCTCCCAC AGGCAGCGGC AAAAGCACCA AGGTCCCGGC TGCATATGCA 
TACGAGGGTG TCCGTCGCCG TTTTCGTGGT TCCAGGGCCG ACGTATACGT 



+ 2AQG.Y KVL VLN PSVA ATL 
2651 GCTCAGGGCT ATAAGGTGCT AGTACTCAAC CCCTCTGTTG CTGCAACACT 
CGAGTCCCGA TATTCCACGA TCATGAGTTG GGGAGACAAC GACGTTGTGA 



"*-2 GFG AYMS KAH GID PNI 
2701 GGGCTTTGGT GCTTACATGT CCAAGGCTCA TGGGATCGAT CCTAACATCA 
CCCGAAACCA CGAATGTACA GGTTCCGAGT ACCCTAGCTA GGATTGTAGT 



+ 2RTGV RTI TTGS PIT YST 
2751 GGACCGGGGT GAGAACAATT ACCACTGGCA GCCCCATCAC GTACTCCACC 
CCTGGCCCCA CTCTT6TTAA TGGTGACCGT CGGGGTAGTG CATGAGGTGG 



+2YGKF LAD GGC SGGA YDI 
2801 TACGGCAAGT TCCTTGCCGA CGGCGGGT6C TCGGGGGGCG CTTATGACAT 
ATGCCGTTCA AGGAACGGCT GCCGCCCACG AGCCCCCCGC GAATACTGTA 



+2 lie DECH STD ATS ILG 
2851 AATAATTTGT GACGAGTGCC ACTCCACGGA TGCCACATCC ATCTTGGGCA 
TTATTAAACA CTGCTCACGG TGAGGTGCCT ACGGTGTAGG TAGAACCCGT 



+_2.IGTV LDQ AETA GAR LVV 
2901 TTGGCACTGT CCTTGACCAA GCAGAGACTG CGGGGGCGAG ACTGGTTGTG 
AACCGTGACA GGAACTGGTT CGTCTCTGAC GCCCCCGCTC TGACCAACAC 



+2 LATA TPP GSV TVPH PNI 
2951 CTCGCCACCG CCACCCCTCC GGGCTCCGTC ACTGTGCCCC ATCCCAACAT 
GAGCGGTGGC GGTGGGGAGG CCCGAGGCAG TGACACGGGG TAGGGTTGTA 



+2 EEV ALST TGE .IPF YGK 
3001 CGAGGAGGTT GCTCTGTCCA CCACCGGAGA GATCCCTTTT TACGGCAAGG 
GCTCCTCCAA CGAGACAGGT GGTGGCCTCT CTAGGGAAAA ATGCCGTTCC 



AIPL EVI KGGR HLI FCH 
3051 CTATCCCCCT CGAAGTAATC AAGGGGGGGA GACATCTCAT CTTCTGTCAT 
GATAGGGGGA GCTTCATTAG TTCCCCCCCT CTGTAGAGTA GAAGACAGTA 



+2SKKK CDE LAA KLVA LGI 
3101 TCAAAGAAGA AGTGCGACGA ACTCGCCGCA AAGCTGGTCG CATTGGGCAT 
AGTTTCTTCT TCACGCTGCT TGAGCGGCGT TTCGACCAGC GTAACCCGTA 



+2 NAV AYYR GLD VSV IPT 
3151 CAATGCCGTG GCCTACTACC GCGGTCTTGA CGTGTCCGTC ATCCCGACCA 
GTTACGGCAC CGGATGATGG CGCCAGAACT GCACAGGCAG TAGGGCTGGT 



+2SGDV VVV ATDA LMT GYT 
1 GCGGCGATGT TGTCGTCGTG GCAACCGATG CCCTCATGAC CGGCTATACC 
CGCCGCTACA ACAGCAGCAC CGTTGGCTAC GGGAGTACTG GCCGATATGG 
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+2 
3251 


GDFD SVI DCN TCVT QTV 
GGCGACTTCG ACTCGGTGAT AGACTGCAAT ACGTGTGTCA CCCAGACAGT 
CCGCTGAAGC TGAGCCACTA TCTGACGTTA TGCACACAGT GGGTCTGTCA 


+ 2 
3301 


DFS LDPT FTI ETI TLP 
CGATTTCAGC CTTGACCCTA CCTTCACCAT TGAGACAATC ACGCTCCCCC 
GCTAAAGTCG GAACTGGGAT GGAAGTGGTA ACTCTGTTAG TGCGAGGGGG 


+2 
3351 


QDAV SRT QRRG RTG RGK 
AAGATGCTGT CTCCCGCACT CAACGTCGGG GCAGGACTGG CAGGGGGAAG 
TTCTACGACA GAGGGCGTGA GTTGCAGCCC CGTCCTGACC GTCCCCCTTC 


+ 2 
3401 


PGIY RFV APG ERPS GMF 
CCAGGCATCT ACAGATTTGT GGCACCGGGG GAGCGCCCCT CCGGCATGTT 
uLiik.wLiiAL3>^ i^iCiAAACA wCbTGGCCCC CTCGCGGGGA GGCCGTACAA 


+ 2 
3451 


DSSVLCECYDAGCAWy 

C'(Xti^*Pf*t^'1*t^^ ^•Pr^OT'f^'P^^/^ /^/■/■*/»^^*H/*fm ^M«tiit«y»^inmHm 

uijH^ xL><.3it^L. t7iUUiL>x\»TC^ AGT GCT ATG A CGCAGGCT GT GCTT GGT AT G 
GCTGAGCAGG CAGGAGACAC . TCACGATACT GCGTCCGACA CGAACCATAC 


+ 2 
3501 


t L T F AET TVRL RAY MNT 
k^jc i COCCGAGACT ACAGTTAGGC TACGAGCGTA CATGAACACC 
TCGAGTGCGG GCGGCTCTGA TGTCAATCCG ATGCTC6CAT GTACTrGTGG 


+ 2 
3551 


PGLP VCQ DHL EFWE GVF 
^-^uijutjv^i CL-GXuTGCCA GGACCATCTT GAATTTTGGG AGGGCGTCTT 
GGCCCCGAAG GGCACACGGT CCTGGTAGAA CTTAAAACCC TCCCGCAGAA 


+2 


TGL THID AHF LSQ TKQ 
StuI 


3601 


TACAGGCCTC ACTCATATAG ATGCCCACTT TCTATCCCAG ACAAAGCAGA 
ATGTCCGGAG TGAGTATATC TACGGCT6AA AGATAGGGTC TGTTTCGTCT 


+2 
3651 


SGENLPYLVAYQATVCA 
GTGGGGAGAA CCTTCCTTAC CTGGTAGCGT ACCAAGCCAC CGTGTGCGCT 
CACCCCTCTT GGAAGGAATG GACCATCGCA TGGTTCGGTG GCACACGCGA 


+ 2 
3701 


RAQAPPPSWDOMWKCLI 
AGGGCTCAAG CCCCTCCCCC ATCGTGGGAC CAGATGTGGA AGTGTTTGAT 
rcCCGAGTTC GGGGAGGGGG TAGCACCCTG GTCTACACCT TCACAAACTA 


+2 
3751 


RLK PTLH GPT PLL YRL 
TCGCCTCAAG CCCACCCTCC ATGGGCCAAC ACCCCTGCTA TACAGACTGG 
AGCGGAGTTC GQOTQGGfiQG TACCCGGTTG TGGGGACGAT ATGTCTGACC 


+ 2 
3801 


GAVQNEITLTHPVTKYI 
GCGCTGTTCA GAATGAAATC ACCCTGACGC ACCCAGTCAC CAAATACATC 
CGCGACAAGT CTTACTTTAG TGGGACTGCG TGGGTCAGTG GTTTATGTAG 


+2 
3851 


MTCM SAD L EV VTST WVL 
ATGACATGCA TGTCGGCCGA CCTGGAGGTC GTCACGAGCA CCTGGGTGCT 
TACTGTACGT ACAGCCGGCT GGACCTCCAG CAGTGCTCGT GGACCCACGA 


+2 
3901 


VGG VLAA.LAA YCL STG 
CGTTGGCGGC GTCCTGGCTG CTTTGGCCGC GTATTGCCTG TCAACAGGCT 
GCAACCGCCG CAGGACCGAC GAAACCGGCG CATAACGGAC AGTTGTCCGfl 
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+ 2CVV *I VGR VVLS GKP All 
3951 GCGTGGTCAT AGTGGGCAGG GTC6TCTTGT CCGGGAAGCC GGCAATCATA 
.CGCACCAGTA TCACCCGTCC CAGCAGAACA GGCCCTTCGG CCGTTAGTAT 



+ 2PDRE VLY REF DEME EC 
4001 CCTGACAGGG AAGTCCTCTA CCGAGAGTTC GATGAGATGG AAGAGTGCTA 
GGACTGTCCC TTCAGGAGAT GGCTCTCAAG CTACTCTACC TTCTCACGAT 



BainHI Mlul 



4051 GGATCCACTA CGCGTTAGAG CTCGCTGATC AGCCTCGACT GTGCCTTCTA 
CCTAGGTGAT GCGCAATCTC GAGCGACTAG TCG6AGCTGA CACGGAAGAT 



4101 GTTGCCAGCC ATCTGTTGTT TGCCCCTCCC CCGTGCCTTC CTTGACCCTG 
. CAACGGTCGG TAGACAACAA ACGGGGAGGG GGCACGGAAG GAACTGGGAC 



4151 GAAGGTGCCA CTCCCACTGT CCTTTCCTAA TAAAATGAGG AAATTGCATC 
CTTCCACGGT GAGGGTGACA GGAAAGGATT ATTTTACTCC TTTAACGTAG 



4201 GCATTGTCTG AGTAGGTGTC ATTCTATTCT GGGGGGTGGG GTGGGGCAGG 
CGTAACAGAC TCATCCACAG TAAGATAAGA CCCCCCACCC CACCCCGTCC 



4251 ACAGCAAGGG GGAGGATTGG GAAGACAATA GCAGGCATGC TGGGGAGCTC 
TGTCGTTCCC CCTCCTAACC CTTCTGTTAT CGTCCGTACG ACCCCTCGAG 



4301 TTCCGCTTCC TCGCTCACTG ACTCGCTGCG CTCGGTCGTT CGGCTGCGGC 
AAGGCGAAGG AGCGAGTGAC TGAGCGACGC GAGCCAGCAA GCCGACGCCG 



4351 GAGCGGTATC AGCTCACTCA AAGGCGGTAA TACGGTTATC CACAGAATCA 
CTCGCCATAG TCGAGTGAGT TTCCGCCATT ATGCCAATAG GTGTCTTAGT 



4401 GGGGATAACG CAGGAAAGAA CATGTGAGCA AJUIGGCCAGC AAAAGGCCAG 
CCCCTATTGC GTCCTTTCTT GTACACTCGT TTTCCGGTCG TTTTCCGGTC 



4451 GAACCGTAAA AAGGCCGCGT TGCTGGCGTT TTTCCATAGG CTCCGCCCCC 
CTTGGCATTT TTCCGGCGCA ACGACCGCAA AAAGGTATCC GAGGCGGGGG 



4501 CTGACGAGCA TCACAAAAAT CGACGCTCAA GTCAGAGGTG GCGAAACCCG 
GACTGCTCGT AGTGTTTTTA GCTGCGAGTT CAGTCTCCAC CGCTTTGGGC 



4551 ACAGGACTAT AAAGATACCA GGCGTTTCCC CCTGGAAGCT CCCTCGTGCG 
TGTCCTGATA TTTCTATGGT CCGCAAAGGG GGACCTTCGA GGGAGCACGC 



4601 CTCTCCTGTT CCGACCCTGC CGCTTACCGG ATACCTGTCC GCCTTTCTCC 
GAGAGGACAA GGCTGGGACG GCGAATGGCC TATGGACAGG CGGAAAGAGG 



4 651 CTTCGGGAAG CGTGGCGCTT TCTCAATGCT CACGCTGTAG GTATCTCAGT 
GAAGCCCTTC GCACCGCGAA AGAGTTACGA GTGCGACATC CATAGAGTCA 



4701 TCGGTGTAGG TCGTTCGCTC CAAGCTGGGC TGTGTGCACG AACCCCCCGT 
AGCCACATCC AGCAAGCGAG GTTCGACCCG ACACACGTGC TTGGGGGGCA 



4751 TCAGCCCGAC CGCTGCGCCT TATCCGGTAA CTATCGTCTT GAGTCCAACC 
AGTCGGGCTG GCGACGCGGA AtAGGCCATT GATAGCAGAA CTCAGGTTGG 



4801 CGGTAAGACA CGACTTATCG CCACTGGCAG CAGCCACTGG TAACAGGATT 
GCCATTCTGT GCTGAATAGC GGTGACCGTC GTCGGTGACC ATTGTCCTAA 
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4851 


AGCAGAGCGA 
TCGTCTCGCT 


GGTATGTAGG 
CCATACATCC 


CGGTGCTACA 
GCCACGATGT 


GAGTTCTTGA 
CTCAAGAACT 


AGTGGTGGCC 
TCACCACCGG 


4901 


TAACTACGGC 
ATTGATGCCG 


TACACTAGAA 
ATGTGATCTT 


GGACAGTATT 
CCTGTCATAA 


TGGTATCTGC 
ACCATAGACG 


GCTCtGCTGA 
CGAGACGACT 


4951 


AGCCAGTTAC 
TCGGTCAATG 


CTTCGGAAAA 
GAAGCCTTTT 


AGAGTTGGTA 
TCTCAACCAT 


GCTCTTGATC 
CGAGAACTAG 


CGGCAAACAA 
GCCGTTTGTT 


5001 


ACCACCGCTG 
TGGTGGCGAC 


GTAGCGGTGG 
CATCGCCACC 


TTTTTTTGTT 
AAAAAAACAA 


TGCAAGCAGC 
ACGTTCGTCG 


AGATTACGCG 
TCTAATGCGC 


5051 


CAGAAAAAAA 
GTCTTTTTTT 


GGATCTCAAG 
CCTAGAGTTC 


AAGATCCTTT 
TTCTAGGAAA 


GATCTTTTCT 
CTAGAAAAGA 


ACGGGGTCTG 
TGCCCCAGAC 


5101 


ACGCTCAGTG 
TGCGAGTCAC 


GAACGAAAAC 
CTTGCTTTTG 


TCACGTTAAG 
AGTGCAATTC 


GGATTTTGGT 
CCTAAAACCA 


CATGAGATTA 
GTACTCTAAT 


5151 


TCAAAAAGGA 
AGTTTTTCCT 


TCTTCACCTA 
AGAAGTGGAT 


GATCCTTTTA 
CTAGGAAAAT 


AATTAAAAAT 
TTAATTTTTA 


GAAGTTTTAA 
CTTCAAAATT 


5201 


ATCAATCTAA 
TAGTTAGATT 


AGTATA7ATG 
TCATATATAC 


AGTAAACTTG 
TCATTTGAAC 


GTCTGACAGT 
CAGACTGTCA 


TACCAATGCT 
ATGGTTACGA 


5251 


TAATCAGTGA 
ATTAGTCACT. 


GGCACCTATC 
CCGTGGATAG 


TCAGCGATCT 
AGTCGCTAGA 


GTCTATTTCG 
CAGATAAAGC 


TTCATCCATA 
AAGTAGGTAT 


5301 


GTTGCCTGAC 
CAACGGACTG 


TCCCCGTCGT 
AGGGGCAGCA 


GTAGATAACT 
CATCTATTGA 


ACGATACGGG 
TGCTATGCCC 


AGGGCTTACC 
TCCCGAATGG 


5351 


ATCTGGCCCC 
TAGACCGGGG 


AGTGCTGCAA 
TCACGACGTT 


TGATACCGCG 
ACTATGGCGC 


AGACCCACGC 
TCTGGGTGCG 


TCACCGGCTC 
AGTGGCCGAG 


5401 


CAGATTTATC 
GTCTAAATAG 


AGCAATAAAC 
TCGTTATTTG 


CAGCCAGCCG 
GTCGGTCGGC 


GAA6GGCCGA 
CTTCCCGGCT 


GCGCAGAAGT 
CGCGTCTTCA 


5451 


GGTCCTGCAA 
CCAGGACGTT 


CTTTATCCGC 
GAAATAGGCG 


CTCCATCCAG 
GAGGTAGGTC 


TCTATTAATT 
AGATAATTAA 


GTTGCCGGGA 
CAACGGCCCT 


5501 


AGCTAGAGTA 
TCGATCTCAT 


AGTAGTTCGC 
TCATCAAGCG 


CAGTTAATAG 
GTCAATTATC 


TTTGCGCAAC 
AAACGCGTTG 


GTTGTTGCCA 
CAACAACGGT 


5551 


TTGCTACAGG 
AACGATGTCC 


CATCGTGGTG 
GTAGCACCAC 


TCACGCTCGT 
AGTGCGAGCA 


CGTTTGGTAT 
GCAAACCATA 


GGCTTCATTC 
CCGAAGTAAG 


5601 


AGCTCCGGTT 
TCGAGGCCAA 


CCCAACGATC 
GGGTTGCTAG 


AAGGCGAGTT 
TTCCGCTCAA 


ACATGATCCC 
TGTACTAGGG 


CCATGTTGTG 
GGTACAACAC 


5651 


CAAAAAAGCG 
GTTTTTTCGC 


GTTAGCTCCT 
CAATCGAGGA 


TCGGTCCTCC 
AGCCAGGAGG 


GATCGTTGTC 
CTAGCAACAG 


AGAAGTAAGT 
TCTTCATTCA 


5701 


TGGCCGCAGT 
ACCGGCGTCA 


GTTATCACTC 
CAATAGTGAG 


ATGGTTATGG 
TACCAATACC 


CAGCACTGCA 
GTCGTGACGT 


TAATTCTCTT 
ATTAAGAGAA 


5751 


ACTGTCATGC 
TGACAGTACG 


CATCCGTAAG 
GTAGGCATTC 


ATGCTTTTCT 
TACGAAAAGA 


GTGACTGGTG AGTACTCAAC 
CACTGACCAC TCATGAGTTG 
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5801 


CAAGTCATTC 
GTTCAGTAAG 


TGAGAATAGT 
ACTCTTATCA 


GTATGCGGCG 
CATACGCCGC 


ACCGAGTTGC 
TGGCTCAACG 


TCTTGCCCGG 
AGAACGGGCC 


5851 


CGTCAATACG 
GCAGTTATGC 


GGATAATACC 
CCTATTATGG 


GCGCCACATA 
CGCGGTGTAT 


GCAGAACTTT 
CGTCTTGAAA 


AAAAGTGCTC 
TTTTCACGAG 


5901 


ATCATTGGAA 
TAGTAACCTT 


AACGTTCTTC 
TTGCAAGAAG 


GGGGCGAAAA 
CCCCGCTTTT 


CTCTCAAGGA TCTTACCGCT 
GAGAGTTCCT AGAATGGCGA 


5951 


GTTGAGATCC 
CAACTCTAGG 


AGTTCGATGT 
TCAAGCTACA 


AACCCACTCG 
TTGGGTGAGC 


TGCACCCAAC 
ACGTGGGTTG 


TGATCTTCAG 
ACTAGAAGTC 


6001 


CATCTTTTAC 
GTAGAAAATG 


TTTCACCAGC 
AAAGTGGTCG 


GTTTCTGGGT 
CAAAGACCCA 


GAGCAAAAAC 
CTCGTTTTTG 


AGGAAGGCAA 
TCCTTCCGTT 


6051 


AATGCCGCAA AAAAGGGAAT AAGGGCGACA CGGAAATGTT GAATACTCAT 
TTACGGCGTT TTTTCCCTTA TTCCCGCTGT GCCTTTACAA CTTATGAGTA 


6101 


ACTCTTCCTT 
TGAGAAGGAA 


TTTCAATATT 
AAAGTTATAA 


ATTGAAGCAT TTATCAGGGT 
TAACTTCGTA AATAGTCCCA 


TATTGTCTCA 
ATAACAGAGT 


6151 


TGAGCGGATA 
ACTCGCCTAT 


CATATTTGAA 
GTATAAACTT 


TGTATTTAGA 
ACATAAATCT 


AAAATAAACA 
TTTTATTTGT 


AATAGGGGTT 
TTATCCCCAA 


6201 


CCGCGCACAT 
GGCGCGTGTA 


TTCCCCGAAA 
AAGGGGCTTT 


AGTGCCACCT 
TCACGGTGGA 


GACGTCTAAG 
CTGCAGATTC 


AAACCATTAT 
TTTGGTAATA 


6251 


TATCATGACA 
ATAGTACTGT 


TTAACCTATA 
AATTGGATAT 


AAAATAGGCG 
TTTTATCCGC 


TATCACGAGG 
ATAGTGCTCC 


CCCTTTCGTC 
GGGAAAGCAG 
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M t AI aAl aTy rAl aAl aGlnGl yTyr Ly s Va 1 LeuVa 1 
2 AGCTTACAAAACAAATTCACCATGGCTGCATATGCAGCTCAGGGCTATAAGGTGCTAGTA 
tCGAATGTTTTGTTTAAGTGGTACCGACGTATACGTCGAGTCCCGATATTCCACGATCAT 

1 HIND3, 21 NCOI, 30 NDEl, 58 SCAI, 

LeuAsnProSerValAlaAlaThrLeuGlyPheGlyAlaTyrMetSerLysAiaHisGiy 
62 CTCAACCCCTCTGTTGCTGCAACACTGGGCTTTGGTGCTTACATGTCCAAGGCTCATGGG 
GAGTTGGGGAGACAACGACGTTGTGACCCGAAACCACGAATGTACAGGTTCCGAGTACCC 

IleAspProAsnlleArgThrGlyValArgThrlleThrThrGlySerProIleThrTyr 
122 ATCGATCCTAACATCAGGACCGGGGTGAGAACAATTACCACTGGCAGCCCCATCACGTAC 
TAGCTAGGATTGTAGTCCTGGCCCCACTCTTGTTAATGGTGACCGTCGGGGTAGTGCATG 

122 CLAI, 

SerThrTyrGlyLysPheLeuAlaAspGlyGlyCysSerGlyGlyAlaTyrAspIlelle 
182 TCCACCTACGGC AAGTTCCTTGCCGACGGCGGGTGCTCGGGGGGCGCTTATGACATAATA 
AGGTGGATGCCGTTCAAGGAACGGCTGCCGCCCACGAGCCCCCCGCGAATACTGTATTAT 

IleCysAspGluCysHisSerThrAspAlaThrSerlleLeuGlylleGlyThrValLeu 
242 ATTTGTGACGAGTGCCACTCCACGGATGCCACATCCATCTTGGGCATTGGCACTGTCCTT 
TAAACACTGCTCACGGTGAGGTGCCTACGGTGTAGGTAGAACCCGTAACCGTGACAGGAA 

AspGlnAlaGluThrAlaGlyAlaArgLeuValValLeuAlaThrAlaThrProProGly 
302 GACCAAGCAGAGACTGCGGGGGCGAGACTGGTTGTGCTCGCCACCGCCACCCCTCCGGGC 
CTGGTTCGTCTCTGACGCCCCCGCTCTGACCAACACGAGCGGTGGCGGTGGGGAGGCCCG 

A 

309 ALWNl, 

SerValThrValProHisProAsnlleGluGluValAlaLeuSerThrThrGlyGiuIle 
362 TCCGTCACTGTGCCCCATCCCAACATCGAGGAGGTTGCTCTGTCCACCACCGGAGAGATC 
AGGCAGTGACACGGGGTAGGGTTGTAGCTCCTCCAACGAGACAGGTGGTGGCCTCTCTAG 

ProPheTyrGlyLysAlalleProLeuGluVallleLysGlyGlyArgHisLeuIlePhe 
4 22 CCTTTTTACGGCAAGGCTATCCCCCTCGAAGTAATCAAGGGGGGGAGACATCTCATCTTC 
GGAAAAATGCCGTTCCGATAGGGGGAGCTTCATTAGTTCCCCCCCTCTGTAGAGTAGAAG 

CysHisSerLysLysLysCysAspGluLeuAlaAlaLysLeuValAiaLeuGlylleAsn 
4 82 TGTCATTCAAAGAAGAAGTGCGACGAACTCGGCGCAAAGCTGGTCGCATTGGGCATCAAT 
ACAGTAAGTTTCTTCTTCACGCTGCTTGAGCGGCGTTTCGACCAGCGTAACCCGTAGTTA 

AiaValAlaTyrTyrArgGlyLeiiAspValSerVallleProThrSerGlyAspValVal 
54 2 GCCGTGGCCTACT ACCGCGGTCTTGACGTGTCCGTCATCCCGACCAGCGGCGATGTTGTC 
CGGCACCGGATGATGGCGCCAGAACTGCACAGGCAGTAGGGCTGGTCGCCGCTACAACAG 

A A 

556 SAC2, 566 DRDl, 

. ValValAiaThrAspAiaLeuMetThrGlyTyrThrGiyAspPheAspSerVallleAsp 
602 GTCGTGGCAACCGATGCCCTCATGACCGGCTATACCGGCGACTTCGACTCGGTGATAGAC 
CAGCACCGTTGGCTACGGGAGTACTGGCCGATATGGCCGCTGAAGCTGAGCCACTATCTG 

A 

621 BSPHl, 

CysAsnThrCysValThrGlnThrValAspPheSerL uAspProThrPheThrlleGlu 
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662 TGCAATACGTGTGTCACCCAGACAGTCGATTTCAGCCTTGACCCTACCTTCACCATTGAG 
ACGTTATGCACACAGTGGGTCTGTCAGCTAAAGTCGGAACTGGGATGGAAGTGGTAACTC 

ThrlleThrLeuProGinAspAlaValSerArgThrGlnArgArgGlyArgThrGlyArg 
722 ACAATCACGCTCCCCCAAGATGCTGTCTCCCGCACTCAACGTCGGGGCAGGACTGGCAGG 
TGTTAGTGCGAGGGGGTTCTACGACAGAGGGCGTGAGTTGCAGCCCCGTCCTGACCGTCC 

GlyLysProGlylleTyrArgPheValAlaProGLyGluArgProSerGlyMetPheAsp 
782 GGGAAGCCAGGCATCT ACAGATTTGTGGCACCGGGGGAGCGCCCCTCCGGCATGTTCGAC 
CCCTTCGGTCCGTAGATGTCTAAACACCGTGGCCCCCTCGCGGGGAGGCCGTACAAGCTG 

822 BGLI, 8 39 DRDl, 

SerSerValLeuCysGluCysTyrAspAiaGlyCysAiaTrpTyrGluLeuThrProAla 
842 TCGTCCGTCCTCTGTGAGTGCTATGACGCAGGCTGTGCTTGGTATG AGCTCACGCCCGCC 
AGCAGGCAGGAGACACTCACGATACTGCGTCCGACACGAACCATACTCGAGTGCGGGCGG 

887 SACI, 

GluThrThrValArgLeuArgAlaTyrMetAsnThrProGlyLeuProValCysGlnAsp 
902 GAGACTACAGTTAGGCTACGAGCGTACATGAACACCCCGGGGCTTCCCGTGTGCCAGGAC 
CTCTGATGTCAATCCGATGCTCGCATGTACTTGTGGGGCCCCGAAGGGCACACGGTCCTG 

A 

937 Sr4AI XMAI, 

HisLeuGluPheTrpGluGlyValPheThrGiyLeuThrHisIleAspAlaHisPheLeu 
962 CATCTTGAATTTTGGGAGGGCGTCTTT ACAGGCCTCACTCATATAGATGCCCACTTTCTA 
GTAGAACTTAAAACCCTCCCGCAGAAATGTCCGGAGTGAGTATATCTACGGGTGAAAGAT 

A 

991 STUI, 

SerGlnThrLysGlnSerGlyGluAsnLeuProTyrLeuValAlaTyrGlnAlaThrVai 
1022 TCCCAGACAAAGCAGAGTGGGGAGAACCTTCCTT ACCTGGTAGCGTACCAAGCCACCGTG 
AGGGTCTGTTTCGTCTCACCCCTCTTGGAAGGAATGGACCATCGCATGGTTCGGTGGCAC 

A 

1075 DRA3, 

CysAlaArgAlaGlnAlaProProProSerTrpAspGlnMetTrpLysCysLeuIleArg 
1082 TGCGCTAGGGCTCAAGCCCCTCCCCCATCGTGGGACCAGATGTGGAAGTGTTTG ATTCGC 
ACGCGATCCCGAGTTCGGGGAGGGGGTAGCACCCTGGTCTACACCTTCACAAACTAAGCG 

LeuLysProThrLeuHisGlyProThrProLeuLeuTyrArgLeuGlyAlaValGlnAsn 
1142 CTCAAGCCCACCCTCCATGGGCCAACACCCCTGCTATACAGACTGGGCGCTGTTCAGAAT 
GAGTTCGGGTGGGAGGTACCCGGTTGTGGGGACGATATGTCTGACCCGCGACAAGTCTTA 

A 

1156 NCOI, 

GluIleThrLeuThrHisProValThrLysTyrlleMetThrCysMetSerAlaAspLeu 
1202 GAAATCACCCTGACGCACCCAGTCACCAAATACATCATGACATGGATGTCGGCCGACCTG 
CTTTAGTGGGACTGCGTGGGTCAGTGGTTTATGTAGTACTGTACGTACAGCCGGCTGGAC 

AAA A A 

1236 BSPHl, 1240 DRDl, 1243 AVA3, 1251 EAGl XMA3, 1256 DRDl, 



GluValValThrSerThrTrpValLeuValGlyGlyValLeuAlaAlaLeuAlaAlaTyr 
1262 GAGGTCGTCACGAGCACCTGGGTGCTCGTTGGCGGCGTCCTGGCTGCTTTGGCCGCGTAT 
rTrranrAGTGCTCGTGGACCCAGGAGCAACCGCCGCAGGACCGAGGAAACCGGCGCATA 
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CysLeuS rThrGlyCysValVallleValGlyArgValValLeuSerGlyLysProAla 
1322 TGCCTGTCAACAGGCTGCGTGGTCATAGTGGGCAGGGTCGTCTTGTCCGGGAAGCCGGCA 
ACGGACAGTTGTCCGACGCACCAGTATCACCCGTCCCAGCAGAACAGGCCCTTCGGCCGT 

1375 NAEI, 

IlerieProAspArgGluValLeuTyrArgGluPheAspGluMetGluGluCysSerGln 
1382 ATCATACCTGACAGGGAAGTCCTCTACCGAGAGTTCGATGAGATGGAAGAGTGCTCTCAG 
TAGTATGGACTGTCCCTTCAGGAGATGGCTCTCAAGCTACTCTACCTTCTCACGAGAGTC 

1391 DRDl, 

HisLe-uProTyrlleGiuGlnGlyMetMetLeuAlaGluGlnPheLysGlnLysAlaLeu 
14 42 CACTTACCGTACATCGAGCAAGGGATGATGCTCGCCGAGCAGTTCAAGCAGAAGGCCCTC 
GTGAATGGCATGTAGCTCGTTCCCTACTACGAGCGGCTCGTCAAGTTCGTCTTCCGGGAG 

GlyLeuLeuGlnThrAlaSerArgGlnAlaGluVallieAlaProAlaValGlnThrAsn 
1502 GGCCTCCTGCAGACCGCGTCCCGTCAGGCAGAGGTTATCGCCCCTGCTGTCCAGACCAAC 
CCGGAGGACGTCTGGCGCAGGGCAGTCCGTCTCGAATAGCGGGGACGACAGGTCTGGTTG 

A A 

1508 PSTI, 1513 TTH3I, 

TrpGlnLysLeuGluThrPheTrpAlaLysHisMetTrpAsnPhelleSerGlylleGln 
1562 TGGCAAAAACTCGAGACCTTCTGGGCGAAGCATATGTGGAACTTCATCAGTGGGATACAA 
ACCGTTTTTGAGCTCTGGAAGACCCGCTTCGTATACACCTTGAAGTAGTCACCCTATGTT 

A A 

1571 XHOI, 1592 NDEI, 

TyrLeuAlaGlyLeuSerThrLeuProGiyAsnProAlalleAlaSerLeuMetAlaPhe 
1622 TACTTGGCGGGCTTGTCAACGCTGCCTGGTAACCCCGCCATTGCTTCATTGATGGCTTTT 
ATGAACCGCCCGAACAGTTGCGACGGACCATTGGGGCGGTAACGAAGTAACTACCGAAAA 

A 

1649 BSTE2, 

ThrAlaAlaValThrSerProLeuThrThrSerGlnThrLeuLeuPheAsnlleLeuGiy 
1682 ACAGCTGCTGTCACCAGCCCACTAACCACTAGCCAAACCCTCCTCTTCAACATATTGGGG 
TGTCGACGACAGTGGTCGGGTGATTGGTGATCGGTTTGGGAGGAGAAGTTGTATAACCCC 

A 

1683 ALWNl PVU2, 

GlyTrpValAlaAlaGlnLeuAlaAlaProGlyAlaAlaThrAlaPheValGlyAlaGly 
17 42 GGGTGGGTGGCTGCCCAGCTCGCCGCCCCCGGTGCCGCTACTGCCTTTGTGGGCGCTGGC 
CCCACCCACCGACGGGTCGAGCGGCGGGGGCCACGGCGATGACGGAAACACCCGCGACCG 

A 

1800 ESPl, 

LeuAlaGlyAlaAlalleGlySerValGlyLeuGlyLysValLeuIleAspIleLeuAla 
1802 TTAGCTGGCGCCGCCATCGGCAGTGTTGGACTGGGGAAGGTCCTCATAGACATCCTTGCA 
AATCGACCGCGGCGGTAGCCGTCACAACCTGACCCCTTCCAGGAGTATCTGTAGGAACGT 

1808 KASl NARI, 

GlyTyrGlyAlaGlyValAlaGlyAlaLeuValAlaPheLysIleMetSerGlyGluVal 
1862 GGGTATGGCGCGGGCGTGGCGGGAGCTCTTGTGGCATTCAAGATCATGAGCGGTGAGGTC 
CCCATACCGCGCCCGCACCGCCCTCGAGAACACCGTAAGTTCTAGTACTCGCCACTCCAG 
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1884 SACI, 1905 BSPHl, 

ProSerThrGluAspLeuValAsnLeuLeuProAlalleLeuSerProGiyAlaLeuVal 
1922 CCCTCCACGGAGGACCTGGTCAATCT ACTGCCCGCCATCCTCTCGCCCGG AGCCCTCGT A 
GGGAGGTGCCTCCTGGACCAGTTAGATGACGGGCGGTAGGAGAGCGGGCCTCGGGAGCAT 

1934 TTH3I, 

ValGlyValValCysAlaAlalleLeuArgArgHisValGlyProGlyGluGlyAlaVal 
1982 GTCGGCGTGGTCTGTGCAGCAATACTGCGCCGGCACGTTGGCCCGGGCGAGGGGGCAGTG 
CAGCCGCACCAGACACGTCGTTATGACGCGGCCGTGCAACCGGGCCCGCTCCCCCGTCAC 

2010 NAEI, 2023 SMAI XMAI, 

GinTrpMetAsnArgLeuIieAlaPheAlaSerArgGlyAsnHisVaiserProThrHis 
2042 CAGTGGATGAACCGGCTGATAGCCTTCGCCTCCCGGGGGAACCATGTTTCCCCCACGCAC 
GTCACCTACTTGGCCGACTATCGGAAGCGGAGGGCCCCCTTGGTACAAAGGGGGTGCGTG 

A ^ 

2073 SMAI XMAI, 2099 DRA3, 

TyrValProGiuSerAspAlaAlaAlaArgValThrAlalleLeuSerSerLeuThrVal 
2102 TACGTGCCGGAGAGCGATGCAGCTGCCCGCGTCACTGCCATACTCAGCAGCCTCACTGTA 
ATGCACGGCCTCTCGCTACGTCGACGGGCGCAGTGACGGTATGAGTCGTCGGAGTGACAT 

A 

2121 PVU2, 

ThrGlnLeuLeuArgArgLeuHisGlnTrpIleSerSerGluCysThrThrProCysSer 
2162 ACCCAGCTCCTGAGGCGACTGCACCAGTGGATAAGCTCGGAGTGTACCACTCCATGCTCC 
TGGGTCGAGGACTCCGCTGACGTGGTCACCTATTCGAGCCTCACATGGTGAGGTACGAGG 

A /\ 

2165 ALWNl, 2170 MST2, 

GlySerTrpLeuArgAspIleTrpAspTrpIleCysGluValLeuSerAspPheLysThr 
2222 GGTTCCTGGCTAAGGGACATCTGGGACTGGATATGCGAGGTGTTGAGCGACTTTAAGACC 

CCAAGGACCGATTCCCTGTAGACCCTGACCTATACGCTCCACAACTCGCTGAAATTCTGG- ' 

2226 ECONl, 

TrpLeuLysAlaLysLeuMetProGinLeuProGlylleProPheValSerCysGlnArg 
2282 TGGCTAAAAGCTAAGCTCATGCCACAGCTGCCTGGGATCCCCTTTGTGTCCTGCCAGCGC 
ACCGATTTTCGATTCGAGTACGGTGTCGACGGACCCTAGGGGAAACACAGGACGGTCGCG 

A /s 

2291 ESPl, 2306 PVU2, 2316 BAMHI, 

GlyTyrLysGlyValTrpArgGlyAspGlylleMetHisThrArgCysHisCysGlyAla 
234 2 GGGTATAAGGGGGTCTGGCGAGGGGACGGCATCATGCACACTCGCTGCCACTGTGGAGCT 
CCCATATTCCCCCAGACCGCTCCCCTGCCGTAGTACGTGTGAGCGACGGTGACACCTCGA 

GluIleThrGlyHisValLysAsnGlyThrMetArglleValGlyProArgThrCysArg 
2402 GAGATCACTGGACATGTCAAAAACGGGACGATGAGGATCGTCGGTCCTAGGACCTGCAGG 
CTCTAGTGACCTGTACAGTTTTTGCCCTGCTACTCCTAGCAGCCAGGATCCTGGACGTCC 

A AAA 

2431 BSABl, 2447 AVR2, 2454 SSE83871, 2455 PSTI, 

AsnMetTrpSerGiyThrPheProIleAsnAlaTyrThrThrGlyProCysThrProLeu 
2462 AACATGTGGAGTGGGACCTTCCCCATTAATGCCTACACCACGGGCCCCTGTACCCCCCTT 
TTGTACACCTCACCCTGGAAGGGGTAATTACGGATGTGGTGCCCGGGGACATGGGGGGAA 
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2486 ASEl, 2503 APAI, * 

E^roAlaProAsnTyrThrPheAlaLeuTrpArgValSerAlaGluGluTyrValGluIle 
2522 CCTGCGCCGAACTACACGTTCGCGCTATGGAGGGTGTCTGCAGAGGAATACGTGGAGATA 
GGACGCGGCTTGATGTGCAAGCGCGATACCTCCCACAGACGTCTCCTTATGCACCTCTAT 

2559 PSTI, 

ArgGlnValGlyAspPheHisTyrValThrGlyMetThrThrAspAsnLeuLysCysPro 
2582 AGGCAGGTGGGGGACTTCCACTACGTGACGGGTATGACTACTGACAATCTT AAATGCCCG 
TCCGTCCACCCCCTGAAGGTGATGCACTGCCCATACTGATGACTGTTAGAATTTACGGGC 

2600 "DRAB, 

CysGlnValProSerProGluPhePheThrGluLeuAspGlyValArgLeuHisArgPhe 
2642 TGCCAGGTCCCATCGCCCGAATTTTTCACAGAATTGGACGGGGTGCGCCTACATAGGTTT 
ACGGTCCAGGGTAGCGGGCTTAAAAAGTGTCTTAACCTGCCCCACGCGGATGTATCCAAA 

AlaProProCysLysProLeuLeuArgGluGluValSerPheArgValGlyLeuHisGlu 
2702 GCGCCCCCCTGCAAGCCCTTGCTGCGGGAGGAGGTATCATTCAGAGTAGGACTCCACGAA 
CGCGGGGGGACGTTCGGGAACGACGCCCTCCTCCATAGTAAGTCTCATCCTGAGGTGCTT 

TyrProValGlySerGinLeuProCysGluProGluProAspVaiAlaValLeuThrSer 
2762 TACCCGGTAGGGTCGCAATTACCTTGCGAGCCCGAACCGGACGTGGCCGTGTTGACGTCC 
ATGGGCCATCCCAGCGTTAATGGAACGCTCGGGCTTGGCCTGCACCGGCACAACTGCAGG 

A A 

2763 HGIE2, 2815 AAT2, 

MetLeuThrAspProSerHisIleThrAiaGluAlaAlaGlyArgArgLeuAlaArgGly 
2822 ATGCTCACTGATCCCTCCCATATAACAGCAGAGGCGGCCGGGCGAAGGTTGGCGAGGGGA 
TACGAGTGACTAGGGAGGGTATATTGTCGTCTCCGCCGGCCCGCTTCCAACCGCTCCCCT 

2856 EAGl XMA3, 

SerProProSerValAlaSerSerSerAlaSerGlnLeuSerAlaProSerLeuLysAla 
2882 TCACCCCCCTCTGTGGCCAGCTCCTCGGCTAGCCAGCTATCCGCTCCATCTCTCAAGGCA 
v AGTGGGGGGAGACACCGGTCGAGGAGCCGATCGGTCGATAGGCGAGGTAGAGAGTTCCGT 

2895 BALI, 2909 NHEl, 

ThrCysThrAlaAsnHlsAspSerProAspAlaGluLeuILeGluAiaAsnLeuLeuTrp 
2942 ACTTGCACCGCTAACCATGACTCCCCTGATGCTGAGCTCATAGAGGCCAACCTCCTATGG 
TGAACGTGGCGATTGGTACTGAGGGGACTACGACTCGAGTATCTCCGGTTGGAGGATACC 

A A 

2972 ESPl, 2975 SACI, 

ArgGlnGIuMetGlyGlyAsnlleThrArgValGluSerGluAsnLysValVairieLeu 
3002 AGGCAGGAGATGGGCGGCAACATCACCAGGGTTGAGTCAGAAAACAAAGTGGTGATTCTG 
TCCGTCCTCTACCCGCCGTTGTAGTGGTCCCAACTCAGTCTTTTGTTTCACCACTAAGAC 

AspSerPheAspProLeuVaiAlaGluGluAspGixiArgGluIleSerValProAlaGlu 
3062 GACTCCTTCGATCCGCTTGTGGCGGAGGAGGACGAGCGGGAGATCTCCGTACCCGCAGAA 
CTGAGGAAGCTAGGCGAACACCGCCTCCTCCTGCTCGCCCTCTAGAGGCATGGGCGTCTT 

A 

3102 BGL2, 
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IleL uArgLysSerArgArgPheAlaGlnAlaLeuProValTrpAlaArgProAspTyr 
3122 ATCCTGCGGAAGTCTCGGAGATTCGCCCAGGCCCTGCCCGTTTGGGCGCGGCCGGACTAT 
TAGGACGCCTTCAGAGCCTCTAAGCGGGTCCGGGACGGGCAAACCCGCGCCGGCCTGATA 

3149 ALWNl, 3170 EAGl XMA3, 

AsnProProLeuValGluThrTrpLysLysProAspTyrGluProProValValHisGly 
3182 AACCCCCCGCTAGTGGAGACGTGGAAAAAGCCCGACTACGAACCACCTGTGGTCCATGGC 
TTGGGGGGCGATCACCTCTGCACCTTTTTCGGGCTGATGCTTGGTGGACACCAGGTACCG 

3223 HGIE2, 3235 NCOI, 

CysProLeuProProProLysSerProProValProProProArgLysLysArgThrVal 
3242 TGCCCGCTTCCACCTCCAAAGTCCCCTCCTGTGCCTCCGCCTCGGAAGAAGCGGACGGTG 
ACGGGCGAAGGTGGAGGTTTCAGGGGAGGACACGGAGGCGGAGCCTTCTTCGCCTGCCAC 

ValLeuThrGluSerThrLeuSerThrAlaLeuAlaGluLeuAlaThrArgSerPheGly 
3302 GTCCTCACTGAATCAACCCTATCTACTGCCTTGGCCGAGCTCGCCACCAGAAGCTTTGGC 
CAGGAGTGACTTAGTTGGGATAGATGACGGAACCGGCTCGAGCGGTGGTCTTCGAAACCG 

3338 SACI, 3352 H1ND3, 

SerSerSerThrSerGlylleThrGlyAspAsnThrThrThrSerSerGluProAlaPro 
3362 AGCTCCTCAACTTCCGGCATTACGGGCGACAATACGACAACATCCTCTGAGCCCGCCCCT 
TCGAGGAGTTGAAGGCCGTAATGCCCGCTGTTATGCTGTTGTAGGAGACTCGGGCGGGGA 

SerGlyCysProProAspSerAspAlaGluSerTyrSerSerMetProProLeuGluGly 
3422 TCTGGCTGCCCCCCCGACTCCGACGCTGAGTCCTATTCCTCCATGCCCCCCCTGGAGGGG 
AGACCGACGGGGGGGCTGAGGCTGCGACTCAGGATAAGGAGGTACGGGGGGGACCTCCCC 

3443 EAM11051, 

GluProGlyAspProAspLeuSerAspGlySerTrpSerThrValSerSerGluAlaAsn 
3482 GAGCCTGGGGATCCGGATCTTAGCGACGGGTCATGGTCAACGGTCAGTAGTGAGGCCAAC 
CTCGGACCCCTAGGCCTAGAATCGCTGCCCAGTACCAGTTGCCAGTCATCACTCCGGTTG 



A A A 



3490 BAMHI, 3491 BSABl, 3493 BSPEl, 

AlaGluAspValValCysCysSerMetSerTyrSerTrpThrGlyAiaLeuValThrPro 
3542 GCGGAGGATGTCGTGTGCTGCTCAATGTCTTACTCTTGGACAGGCGCACTCGTCACCCCG 
CGCCTCCTACAGCACACGACGAGTTACAGAATGAGAACCTGTCCGCGTGAGCAGTGGGGC 

A 

3595 DRA3, 

CysAlaAlaGluGluGlnLysLeuProIleAsnAlaLeuSerAsnSerLeuLeuArgHis 
3602 TGCGCCGCGGAAGAACAGAAACTGCCCATCAATGCACTAAGCAACTCGTTGCTACGTCAC 
ACGCGGCGCCTTCTTGTCTTTGACGGGTAGTTACGTGATTCGTTGAGCAACGATGCAGTG 

^ A ^ 

3606 SAC2, 3617 ALWNl, 3661 PFLMl, 

HisAsnLeuValTyrSerThrThrSerArgSerAlaCysGlnArgGlnLysLysValThr 
3662 CACAATTTGGTGTATTCCACCACCTCACGCAGTGCTTGCCAAAGGCAGAAGAAAGTCACA 
GTGTTAAACCACATAAGGTGGTGGAGTGCGTCACGAACGGTTTCCGTCTTCTTTCAGTGT 

A 

3687 DRA3, 

PheAspArgLeuGlnValLeuAspS rHisTyrGlnAspValLeuLysGluValLysAla 
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3722 TTTGACAGACTGCAAGTTCTGGACAGCCATTACCAGGACGTACTCAAGGAGGTTAAAGCA 
AAACTGTCTGACGTTCAAGACCTGTCGGTAATGGTCCTGCATGAGTTCCTCCAATTTCGT 

* AlaAlaSerLysValLysAlaAsnLeuLeuSerValGluGluAlaCysSerLeuThrPro 
3782 GCGGCGTCAAAAGTGAAGGCTAACTTGCTATCCGTAGAGGAAGCTTGCAGCCTGACGCCC 
CGCCGCAGTTTTCACTTCCGATTGAACGATAGGCATCTCCTTCGAACGTCGGACTGCGGG 

3922 HIND3, 

ProHisSerAlaLysSerLysPheGlyTyrGlyAlaLysAspValArgCysHisAlaArg 
3842 CCACACTCAGCCAAATCCAAGTTTGGTTATGGGGCAAAAGACGTCCGTTGCCATGCCAGA 
GGTGTGAGTCGGTTTAGGTTCAAACCAATACCCCGTTTTCTGCAGGCAACGGTACGGTCT 

/S A. 

3881 •AAT2, 3896 BGLI, 

LysAlaValThrHisIl«AsnSerValTrpLysAspLeuLeuGluAspAsnValThrPro 
3902 AAGGCCGTAACCCACATCAACTCCGTGTGGAAAGACCTTCTGGAAGACAATGTAACACCA 
TTCCGGCATTGGGTGTAGTTGAGGCACACCTTTCTGGAAGACCTTCTGTTACATTGTGGT 

IleAspThrThrrieMetAlaLysAsnGluValPheCysValGlnProGluLysGlyGly 
3962 ATAGACACTACCATCATGGCTAAGAACGAGGTTTTCTGCGTTCAGCCTGAGAAGGGGGGT 
TATCTGTGATGGTAGTACCGATTCTTGCTCCAAAAGACGCAAGTCGGACTCTTCCCCCCA 

ArgLysProAlaArgLeuIleValPheProAspLeuGLyValArgValCysGiuLysMet 
4022 CGTAAGCCAGCTCGTCTCATCGTGTTCCCCGATCTGGGCGTGCGCGTGTGCGAAAAGATG 
GCATTCGGTCGAGCAGAGTAGCACAAGGGGCTAGACCCGCACGCGCACACGCTTTTCTAC 

ALaLeuTyrAspValValThrLysLeuProLeuAlaValMetGlySerSerTyrGlyPhe 
4082 GCTTTGTACGACGTGGTTACAAAGCTCCCCTTGGCCGTGATGGGAAGCTCCTACGGATTC 
CGAAACATGCTGCACCAATGTTTCGAGGGGAACCGGCACTACCCTTCGAGGATGCCTAAG 

GlnTyrSerProGiyGlnArgValGluPheLeuValGlnAlaTrpLysSerLysLysThr 
414 2 CAATACTCACCAGGACAGCGGGTTGAATTCCTCGTGCAAGCGTGGAAGTCCAAGAAAACC 
GTTATGAGTGGTCCTGTCGCCCAACTTAAGGAGCACGTTCGCACCTTCAGGTTCTTTTGG 

4166 ECORI, 

ProMetGIyPheSerTyrAspThrArgCysPheAspSerThrValThrGluSerAspIle 
4202 CCAATGGGGTTCTCGTATGATACCCGCTGCTTTGACTCCACAGTCACTGAGAGCGACATC 
GGtlTAGCCCAAGAGCATACTATGGGCGACGAAACTGAGGTGTCAGTGACTCTCGCTGTAG 

H'' A A 

4235 DRDl, 4242 ALWNl, 

ArgThrGluGluAlalleTyrGlnCysCysAspLeuAspProGlnAlaArgValAlalle 
4262 CGTACGGAGGAGGCAATCTACCAATGTTGTGACCTCGACCCCCAAGCCCGCGTGGCCATC 
GCATGCCTCCTCCGTTAGATGGTTACAACACTGGAGCTGGGGGTTCGGGCGCACCGGTAG 

A A 

4307 BGLI, 4314 BALI, 

LysSerLeuThrGluArgLeuTyrValGlyGlyProLeuThrAsnSerArgGlyGluAsn 
4322 AAGTCCCTCACCGAGAGGCTTT ATGTTGGGGGCCCTCTT ACCAATTCAAGGGGGGAGAAC 
TTCAGGGAGTGGCTCTCCGAAATACAACCCCCGGGAGAATGGTTAAGTTCCCCCCTCTTG 

A 

4 351 APAI, 

CysGlyTyrArgArgCysArgAlaSerGlyValLeuThrThrSerCysGlyAsnThrLeu 
4 382 TGCGGCTATCGCAGGTGCCGCGCGAGCGGCGTACTGACAACTAGCTGTGGTAACACCCTC 
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ACGCCGATAGCGTCCACGGCGCGCTCGCCGCATGACTGTTGATCGACACCATTGTGGGAG 

ThrCysTyrlleLysAlaArgAlaAlaCysArgAlaAlaGlyLeuGlnAspCysThrMet 
4442 ACTTGCTACATCAAGGCCCGGGCAGCCTGTCGAGCCGCAGGGCTCCAGGACTGCACCATG 
TGAACGATGTAGTTCCGGGCCCGTCGGACAGCTCGGCGTCCCGAGGTCCTGACGTGGTAC 

4458 SMAI XMAI, 

LeuValCysGlyAspAspLeuValVairieCysGluSerAlaGlyValGlnGluAspAla 
4502 CTCGTGTGTGGCGACGACTTAGTCGTTATCTGTGAAAGCGCGGGGGTCCAGGAGGACGCG 
GAGCACACACCGCTGCTGAArCAGCAATAGACACTTTCGCGCCCCCAGGTCCTCCTGCGC 

4514 DRDl, 4517 TTH3I, 

AiaSerLeuArgAlaPheThrGluAlaMetThrArgTyrSerAlaProProGlyAspPro 
4 562 GCGAGCCTGAGAGCCTTCACGGAGqCTATGACCAGGTACTCCGCCCCCCCTGGGGACCCC 
CGCTCGGACTCTCGGAAGTGCCTCCGATACTGGTCCATGAGGCGGGGGGGACCCCTGGGG 

ProGlnProGluTyrAspLeuGluLeuIleThrSerCysSerSerAsnValSerValAla 
4 622 CCACAACCAGAATACGACTTGGAGCTCATAACATCATGCTCCTCCAACGTGTCAGTCGCC 
GGTGTTGGTCTTATGCTGAACCTCGAGTATTGTAGTACGAGGAGGTTGCACAGTCAGCGG 

4643 SACI, 

HisAspGlyAiaGIyLysArgValTyrTyrLeuThrArgAspProThrThrProLeuAla 
4 682 CACGACGGCGCTGGAAAGAGGGTCTACTACCTCACCCGTGACCCTACAACCCCCCTCGCG 
GTGCTGCCGCGACCTTTCTCCCAGATGATGGAGTGGGCACTGGGATGTTGGGGGGAGCGC 

4737 NRUI, 

ArgAlaAlaTrpGluThrAlaArgHisThrProValAsnSerTrpLeuGlyAsnllelle 
4 742 AGAGCTGCGTGGGAGACAGCAAGACACACTCCAGTCAATTCCTGGCTAGGCAACATAATC 
TCTCGACGCACCCTCTGTCGTTCTGTGTGAGGTCAGTTAAGGACCGATCCGTTGTATTAG 

MetPheAlaProThrLeuTrpAlaArgMetXleLeuMetThrHisPhePheSerValLeu 
4 802 ATGTTTGCCCCCACACTGTGGGCGAGGATGATACTGATGACCCATTTCTTTAGCGTCCTT 
TACAAACGGGGGTGTGACACCCGCTCCTACTATGACTACTGGGTAAAGAAATCGCAGGAA 

4812 PFLMl, 4813 DRA3, 

IleAlaArgAspGlnLeuGluGlnAlaLeuAspCysGiulieTyrGlyAiaCysTyrSer 
4 862 ATAGCCAGGGACCAGCTTGAACAGGCCCTCGATTGCGAGATCTACGGGGCCTGCTACTCC 
TATCGGTCCCTGGTCGAACTTGTCCGGGAGCTAACGCTCTAGATGCCCCGGACGATGAGG 

A 

4899 BGL2, . 

IleGluProLeuAspLeuProProIlelleGlnArgLeuHisGlyLeuSerAlaPheSer 
4 922 ATAGAACCACTGGATCTACCTCCAATCATTCAAAGACTCCATGGCCTCAGCGCATTTTCA 
TATCTTGGTGACCTAGATGGAGGTTAGTAAGTTTCTGAGGTACCGGAGTCGCGTAAAAGT 

4960 NCOI, 

LeuHisSerTyrSerProGlyGluIleAanArgValAlaAlaC'ysLeuArgLysLeuGly 
4 982 CTCCACAGTTACTCTCCAGGTGAAATCAATAGGGTGGCCGCATGCCTCAGAAAACTTGGG 
GAGGTGTCAATGAGAGGTCCACTTTAGTTATCCCACCGGCGTACGGAGTCTTTTGAACCC 

5021 SPHI, 5041 KPNI, 
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ValProProLeuArgAlaTrpArgHisArgAlaArgSerValArgAlaArgLeuLeuAla 
5042 GTACCGCCCTTGCGAGCTTGGAGACACCGGGCCCGGAGCGTCCGCGCTAGGCTTCTGGCC 
CATGGCGGGAACGCTC6AACCTCTGTGGCCCGGGCCTCGCAGGCGCGATCCGAAGACCGG 

5070 APAI, 5097 BALI, 

ArgGlyGlyArgAlaAlalleCysGlyLysTyrLeuPheAsnTrpAlaValArgThrLys 
5102 AGAGGAGGCAGGGCTGCCAT ATGTGGCAAGT ACCTCTTCAACTGGGCAGT AAGAAC AAAG 
TCTCCTCCGTCCCGACGGTATACACCGTTCATGGAGAAGTTGACCCGTCATTCTTGTTTC 

5119 NDEI, 

LeuLysLeuThrProIleAlaAlaAIaGlyGlnLeuAspLeuSerGlyTrpPheThrAla 
5 1 62 CTCAAACTCACTCCAATAGCGGCCGCTGGCCAGCTGGACTTGTCCGGCTGGTTCACGGCT 
GAGTTTGAGTGAGGTTATCGCCGGCGACCGGTCGACCTGAACAGGCCGACCAAGTGCCGA 

A A A A 

5180 NOTI, 5181 EAGl XMA3, 5188 BALI, 5192 PVU2, 

GlyTyrSerGlyGlyAspIleTyrHisSerValSerHisAlaArgProArgTrpIieTrp 
5222 GGCTACAGCGGGGGAGACATTTATCACAGCGTGTCTCATGCCCGGCCCCGCTGGATCTGG 
CCGATGTCGCCCCCTCTGTAAATAGTGTCGCACAGAGTACGGGCCGGGGCGACCTAGACC 

A 

524 6 DRA3, 

PheCysLeuLeuLeuLeuAlaAlaGlyValGlylleTyrLeuLeuProAsnArgOP 
5282 TTTTGCCTACTCCTGCTTGCTGCAGGGGTAGGCATCTACCTCCTCCCCAACCGATGAAGG 
AAAACGGATGAGGACGAACGACGTCCCCATCCGTAGATGGAGGAGGGGTTGGCTACTTCC 

^ A 

5301 PSTI, 5331 HGIE2, 



5342 TTGGGGTAAACACTCCGGCCTAAAAAAAAAAAAAAATCTAGAACCCGAGTCGAC 
AACCCGATTTGTGAGGCCGGATTTTTTTTTTTTTTTAGATCTTGGGCTCAGCTG 

A A 

'5378 XBAI, 5390 SALI, 
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MetAlaAlaTyrAlaAlaGlnGlyTyrLysValLeuVaiLeuAsn 
2 AGCTTACAAAACAAAATGGCTGCATATGCAGCTCAGGGCTATAAGGTGCTAGTACTCAAC 
- TCGAATGTTTTGTTTTACCGACGTATACGTCGAGTCCCGATATTCCACGATCATGAGTTG 

1 HIND3, 24 NDEI, 52 SCAI, 

ProSerValAlaAlaXhrLeuGlyPheGlyAlaTyrMetSerLysAlaHisGlylieAsp 
62 CCCTCTGTTGCTGCAACACTGGGCTTTGGTGCTTACATGTCCAAGGCTCATGGGATCGAT 
GGGAGACAACGACGTTGTGACCCGAAACCACGAATGTACAGGTTCCGAGTACCCTAGCTA 

116 CLAI, 

PrdAsnlleArgThrGlyValArgThrlleThrThrGlySerProIleThrTyrSerThr 
122 CCTAACATCAGGACCGGGGTGAGAACAATTACCACTGGCAGCCCCATCACGTACTCCACC 
GGATTGTAGTCCTGGCCCCACTCTTGTTAATGGTGACCGTCGGGGTAGTGCATGAGGTGG 

TyrGlyLysPheLeuAlaAspGlyGlyCysSerGlyGlyAlaTyrAspIlellelieCys 
182 TACGGCAAGTTCCTTGCCGACGGCGGGTGCTCGGGGGGCGCTTATGACATAATAATTTGT 
ATGCCGTTCAAGGAACGGCTGCCGCCCACGAGCCCCCCGCGAATACTGTATTA7TAAACA 

AspGluCysHisSerThrAspAlaThrSerlleLeuGlylleGlyThrValLeuAspGin 
242 GACGAGTGCCACTCCACGGATGCCACATCCATCTTGGGCATTGGCACTGTCCTTGACC AA 
CTGCTCACGGTGAGGTGCCTACGGTGTAGGTAGAACCCGTAACCGTGACAGGAACTGGTT 

AlaGluThrAlaGlyAlaArgLeuValValLeuAlaThrAlaThrProProGlySerVal 
302- GCAGAGACTGCGGGGGCGAGACTGGTTGTGCTCGCCACCGCCACCCCTCCGGGCTCCGTC 
CGTCTCTGACGCCCCCGCTCTGACCAACACGAGCGGTGGCGGTGGGGAGGCCCGAGGCAG 

303 ALWNl, 

ThrValProHisProAsnlleGluGluValAlaLeuSerThrThrGlyGluIieProPhe 
36'2 ACTGTGCCCCATCCCAACATCGAGGAGGTTGCTCTGTCCACCACCGGAGAGATCCCTTTT 
TGACACGGGGTAGGGTTGTAGCTCCTCCAACGAGACAGGTGGTGGCCTCTCTAGGGAAAA 

TyrGlyLysAlalleProLeuGluVailXeLysGlyGlyArgHisLeuIlePheCysHis 
4 22 TACGGCAAGGCTATCCCCCTCGAAGTAATCAAGGGGGGGAGACATCTCATCTTCTGTCAT 
ATGCCGTTCCGATAGGGGGAGCTTCATTAGTTCCCCCCCTCTGTAGAGTAGAAGACAGTA 

SerLysLysLysCysAspGlaLeuAlaAlaLysLeuValAlaLeuGlylleAsnAlaVal 
4 82 TCAAAGAAGAAGTGCGACGAACTCGCCGCAAAGCTGGTCGCATTGGGCATCAATGCCGTG 
AGTTTCTTCTTCACGCTGCTTGAGCGGCGTTTCGACCAGCGTAACCCGTAGTTACGGCAC 

AlaTyrTyrArgGlyLeuAspValSerVallleProThrSerGlyAspValValValVal 
54 2 GCCT ACTACCGCGGTCTTGACGTGTCCGTCATCCCGACCAGCGGCGATGTTGTCGTCGTG 
CGGATGATGGCGCCAGAACTGCACAGGCAGTAGGGCTGGTCGCCGCTACAACAGCAGCAC 

550 SAC2, 560 DRDl, 

AlaThrAspAlaLeuMetThrGlyXyrThrGlyAspPheAspSerVallleAspCysAsn 
602 GCAACCGATGCCCTCATGACCGGCTATACCGGCGACTTCGACTCGGTGATAGACTGCAAT 
CGTTGGCTACGGGAGTACTGGCCGATATGGCCGCTGAAGCTGAGCCACTATCTGACGTTA 

A 

615 BSPHl . 
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ACGTGTGTCACCCAGACAGTCGATTTCAGCCTTGACCCTACCTTCACCATfGAGACAATr 
TGCACACAGTGGGTCTGTCAGCTAAAGTCGGAACTGGGATGGAAGTGGTAACTCTGTTAG 

ThrLeuProGlnAspAlaValSerArgThrGlnArgArgGlyArgThrGiyArgGlyLys 
ACGCTCCCCCAAGATGCTGTCTCCCGCACTCAACGTCGGGGCAGGACTGGCAGGGGGAAG 
TGCGAGGGGGTTCTACGACAGAGGGCGTGAGTTGCAGCCCCGTCCTGACCGTCCCCCTTC 

ProGlylleTyrArgPheValAlaProGlyGluArgProSerGiyMetPheAspSerSer 
CCAGGCATCTACAGATTTGTGGCACCGGGGGAGCGCCCCTCCGGCATGTTCGACTCGTCC 
GGTCCGTAGATGTCTAAACACCGTGGCCCCCTCGCGGGGAGGCCGTACAAGCTGAGCAGG 

816 3GLI, 833 DRDl, 

ValLeuCysGluCysTyrAspAlaGlyCysAlaTrpTyrGluLeuThrProAlaGluThr 
GTCCTCTGTGAGTGCTATGACGCAGGCTGTGCTTGGTATGAGCTCACGCCCGCCGAGACT 
CAGGAGACACTCACGATACTGCGTCCGACACGAACCATACTCGAGTGCGGGCGGCTCTGA 

881 SACI, 

ThrValArgLeuArgAlaTyrMetAsnThrProGiyLeuProValCysGlnAspHisLeu 
ACAGTTAGGCTACGAGCGTACATGAACACCCCGGGGCTTCCCGTGTGCCAGGACCATCTT 
TGTCAATCCGATGCTCGCATGTACTTGTGGGGCCCCGAAGGGCACACGGTCCTGGTAGAA 

931 SMAI XMAI, 

GluPheTrpGluGlyValPheThrGlyLeuThrHisIleAspAlaHisPheLeuSerGln 
GAATTTTGGGAGGGCGTCTTTACAGGCCTCACTCATATAGATGCCCACTTTCTATCCCAG 
CTTAAAACCCTCCCGCAGAAA7GTCCGGAGTGAGTATATCTACGGGTGAAAGATAGGGTC 

985 STUI, 

ThrLysGlnSerGlyGluAsnLeuProTyrLeuValAlaTyrGinAlaThrValCysAla 
ACAAAGCAGAGTGGGGAGAACCTTCCTTACCTGGTAGCGTACCAAGCCACCGTGTGCGCT 
TGTTTCGTCTCACCCCTCTTGGAAGGAATGGACCATCGCATGGTTCGGTGGCACACGCGA 

1069 DRA3, 

ArgAiaGlnAlaProProProSerTrpAspGlnMetTrpLysCysLeuIleArgLeuLys 
AGGGCTCAAGCCCCTCCCCCATCGTGGGACCAGATGTGGAAGT&TTTGATTCGCCTCAAG 
TCCCGAGTTCGGGGAGGGGGTAGCACCCTGGTCTACACCTTCACAAACTAAGCGGAGTTC 

ProThrLeuHisGlyProThrProLeuLeuTyrArgLeuGlyAlaValGlnAsnGluIle 
CCCACCCTCCATGGGCCAACACCCCTGCTATACAGACTGGGCGCTGTTCAGAATGAAATC 
GGGTGGGAGGTACCCGGTTGTGGGGACGATATGTCTGACCCGCGACAAGTCTTACTTTAG 

1150 NCOI, 

ThrLeuThrHisProValThrLysTyrlleMetThrCysMetSerAiaAspLeuGiuVai 
ACCCTGACGCACCCAGTCACCAAATACATCATGACATGCATGTCGGCCGACCTGGAGGTC 
TGGGACTGCGTGGGTCAGTGGTTTATGTAGTACTGTACGTACAGCCGGCTGGACCTCCAG 

1230 BSPHl, 1234 DRDl, 1237 AVA3, 1245 EAGl XMA3, 1250 DRDl, 



ValThrSerThrTrpValLeuValGlyGlyValLeuAlaAlaLeuAlaAlaTyrCysLeu 
GTCACGAGCACCTGGGTGCTCGTTGGCGGCGTCCTGGCTGCTTTGGCCGCGTATTGCCTG 
CAGTGCTCGTGGACCCACGAGCAACCGCCGCAGGACCGACGAAACCGGCGCATAACGGAC 
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SerThrGlyCysValVallleValGlyArgValVaiLeuSerGiyLysProAlallelle 
1322 TCAACAGGCTGCGTGGTCATAGTGGGCAGGGTCGTCTTGTCCGGGAAGCCGGCAATCATA 
AGTTGTCCGACGCACCAGTATCACCCGTCCCAGCAGAACAGGCCCTTCGGCCGTTAGTAT 

1369 NAEI, 

ProAspArgGluValLeuTyrArgGluPheAspGluMetGluGluCysSerGinHisLeu 
1382 CCTGACAGGGAAGTCCTCTACCGAGAGTTCGATGAGATGGAAGAGTGCTCTCAGCACTTA 
GGACTGTCCCTTCAGGAGATGGCTCTCAAGCTACTCTACCTTCTCACGAGAGTCGTGAAT 

1385 DRDl, 

ProTyrlieGluGlnGlyMetMetLeuAlaGluGlnPheLysGlnLysAlaLeuGiyLeu 
14 4 2 CCGTACATCGAGCAAGGGATGATGCTCGCCGAGCAGTTCAAGCAGAAGGCCCTCGGCCTC 
GGCATGTAGCTCGTTCCCTACTACGAGCGGCTCGTCAAGTTCGTCTTCCGGGAGCCGGAG 

LeuGlnThrAiaSerArgGlnAlaGluVallleAlaProAlaValGlnThrAsnTrpGln 
1502 CTGCAGACCGCGTCCCGTCAGGCAGAGGTTATCGCCCCTGCTGTCCAGACCAACTGGCAA 
GACGTCTGGCGCAGGGCAGTCCGTCTCCAATAGCGGGGACGACAGGTCTGGTTGACCGTT 

1502 PSTI, 1507 TTH3I, 

LysLeuGluThrPheTrpAlaLysHisMetTrpAsnPhelleSerGlylieGlnTyrLeu 
1562 AAACTCGAGACCTTCTGGGCGAAGCATATGTGGAACTTCATCAGTGGGATACAATACTTG 
TTTGAGCTCTGGAAGACCCGCTTCGTATACACCTTGAAGTAGTCACCCTATGTTATGAAC 

1565 XHOI, 1586 NDEI, 

AlaGlyLeuSerThrLeuProGlyAsnProAlalleAlaSerLeuMetAlaPheThrAla 
1622 GCGGGCTTGTCAACGCTGCCTGGTAACCCCGCCATTGCTTCATTGATGGCTTTTACAGCT 
CGCCCGAACAGTTGCGACGGACCATTGGGGCGGTAACGAAGTAACTACCGAAAATGTCGA 

1643 BSTE2, 1677 ALWNl PVU2, 

AlaValThrSerProLeuThrThrSerGlnThrLeuLeuPheAsnlleLeuGiyGiyTrD 
1682 GCTGTCACCAGCCCACTAACCACTAGCCAAACCCTCCTCTTCAACATATTGGGGGGGTGG 
CGACAGTGGTCGGGTGATTGGTGATCGGTTTGGGAGGAGAAGTTGTATAACCCCCCCACC 

- ValAlaAlaGlnLeuAlaAlaProGlyAlaAlaThrAlaPheValGlyAlaGlyLeuAla 
17 4 2 GTGGCTGCCCAGCTCGCCGCCCCCGGTGCCGCTACTGCCTTTGTGGGCGCTGGCTTAGCT 
CACCGACGGGTCGAGCGGCGGGGGCCACGGCGATGACGGAAACACCCGCGACCGAATCGA 

i79nspi7 

GlyAlaAlalleGlySerValGlyLeuGlyLysValLeuIleAspIleLeuAlaGlyTyr 
1802 GGCGCCGCCATCGGCAGTGTTGGACTGGGGAAGGTCCTCATAGACATCCTTGCAGGGTAT 
CCGCGGCGGTAGCCGTCACAACCTGACCCCTTCCAGGAGTATCTGTAGGAACGTCCCATA 

A 

1802 KASl NARI, 

GlyAlaGlyValAlaGlyAlaLeuValAlaPheLysIleMetSerGlyGluValProSer 
1862 GGCGCGGGCGTGGCGGGAGCTCTTGTGGCATTCAAGATCATGAGCGGTGAGGTCCCCTCC 
CCGCGCCCGCACCGCCCTCGAGAACACCGTAAGTTCTAGTACTCGCCACTCCAGGGGAGG 

A A 

1878 SACI, 1899 BSPHl, 
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ThrGiuAspLeuValAsnLeuLeuProAlalleLeuSerProGlyAlaLeuValVaiGly 
1S22 ACGGAGGACCTGGTCAATCTACTGCCCGCCATCCTCTCGCCCGGAGCCCTCGTAGTCGGC 
TGCCTCCTGGACCAGTTAGATGACGGGCGGTAGGAGAGCGGGCCTCGGGAGCATCAGCCG 

• 1928 TTH3I, 

ValValCysAlaAlalleLeuArgArgHisValGlyProGlyGluGlyAlaValGinTrp 
1982 GTGGTCTGTGCAGCAATACTGCGCCGGCACGTTGGCCCGGGCGAGGGGGCAGTGCAGTGG 
CACCAGACACGTCGTTATGACGCGGCCGTGCAACCGGGCCCGCTCCCCCGTCACGTCACC 

2004 NAEI, 2017 SMAI XMAI, 

MetAsnArgLeuIleAlaPheAlaSerArgGlyAsnHisVaiSerProTnrHisTyrVal 
2042 ATQAACCGGCTGATAGCCTTCGCCTCCCGGGGGAACCATGTTTCCCCCACGCACTACGTG 
TACTTGGCCGACTATCGGAAGCGGAGGGCCCCCTTGGTACAAAGGGGGTGCGTGATGCAC 

2067 SMAI XMAI, 2093 DRA3, 

ProGluSerAspAlaAlaAlaArgValThrAlalleLeuSerSerLeuThrValThrGln 
2102 CCGGAGAGCGATGCAGCTGCCCGCGTCACTGCCATACTCAGCAGCCTCACTGTAACCCAG 
GGCCTCTCGCTACGTCGACGGGCGCAGTGACGGTATGAGTCGTCGGAGTGACATTGGGTC 

2115 PVU2, 2159 ALWNl, 

LeuLeuArgArgLeuHisGlnTrpIleSerSerGluCysThrThrProCysSerGlySer 
2162 CTCCTGAGGCGACTGCACCAGTGGATAAGCTCGGAGTGTACCACTCCATGCTCCGGTTCC 
GAGGACTCCGCTGACGTGGTCACCTATTCGAGCCTCACATGGTGAGGTACGAGGCCAAGG 

2164 MST2, 2220 ECONl, 

TrDLeuArgAspIieTrpAspTrpIleCysGluVaiLeuSerA.spPheLysThrTrpLeu 
2222 TGGCTAAGGGACATCTGGGACTGGATATGCGAGGTGTTGAGCGACTTTAAGACCTGGCTA 
ACCGATTCCCTGTAGACCCTGACCTATACGCfCCACAACTCGCTGAAATTCTGGACCGAT 

LysAlaLysLeuMetProGinLeuProGlylleProPheValSerCysGinArgGiyTyr 
2232 AAAGCTAAGCTCATGCCACAGCTGCCTGGGATCCCCTTTGTGTCCTGCCAGCGCGGGT AT 
TTTCGATTCGAGTACGGTGTCGACGGACCCTAGGGGAAACACAGGACGGTCGCGCCCATA 

2285 ESPl, 2300 PVU2, 2310 BAMHI, 

LysGlyValTrpArgGlyAspGiylleMetHisThrArgCysHisCysGlyAlaGluIle 
234 2 AAGGGGGTCTGGCGAGGGGACGGCATCATGCACACTCGCTGCCACTGTGGAGCTGAGATC 
T7CCCCCAGACCGCTCCCCTGCCGTAGTACGTGTGAGCGACGGTGACACCTCGACTCTAG 

ThrGlyHisValLysAsnGlyThrMetArglleValGlyProArgThrCysArgAsnMet 
24 02 ACTGGACATGTCAAAAACGGGACGATGAGGATCGTCGGTCCTAGGACCTGCAGGAACATG 
TGACCTGTACAGTTTTTGCCCTGCTACTCCTAGCAGCCAGGATCCTGGACGTCCTTGTAC 

2425 BSABl, 2441 AVR2, 2448 SSE83871, 2449 PSTI, 

TrpSerGlyThrPheProIleAsnAlaTyrThrThrGlyProCysThrProLeuProAla 
24 62 TGGAGTGGGACCTTCCCCATTAATGCCTACACCACGGGCCCCTGTACCCCCCTTCCTGCG 
ACCTCACCCTGGAAGGGGTAATTACGGATGTGGTGCCCGGGGACATGGGGGGAAGGACGC 

2480 ASEl, 2497 APAI, 

ProAsnTyrThrPheAlaLeuTrpArgValSerAlaGluGluTyrValGluIleArgGin 
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2522 CCGAACTACACGTTCGCGCTATGGAGGGTGTCTGCAGAGGAATACGTGGAGATAAGGCA3 
GGCT7GATGTGCAAGCGCGATACCTCCCACAGACGTCTCCTTATGCACCTCTATTCCGTC 

2553 PSTI, 

ValGlyAspPheHisTyrValThrGlyMetThrThrAspAsnLeuLysCysProCysGln 
2582 GTGGGGGACTTCCACTACGTGACGGGTATGACTACTGACAATCTTAAATGCCCGTGCCAG 
CACCCCCTGAAGGTGATGCACTGCCCATACTGATGACTGTTAGAATTTACGGGCACGGTC 

2594 DRAB, 

ValProSerProGluPhePheThrGluLeuAspGlyValArgLeuHisArgPheAlaPrc 
264 2 GTCCCATCGCCCGAATTTTTCACAGAATTGGACGGGGTGCGCCTACATAGGTTTGCGCCC 
CAGGGTAGCGGGCTTAAAAAGTGTCTTAACCTGCCCCACGCGGATGTATCCAAACGCGGG 

ProCysLysProLeuLeuArgGluGluValSerPheArgValGlyLeuHisGiuTyrPro 
2702 CCCTGCAAGCCCTTGCTGCGGGAGGAGGTATCATTC AGAGTAGGACTCCACGAATACCCG 
GGGACGTTCGGGAACGACGCCCTCCTCCATAGTAAGTCTCATCCTGAGGTGCTTATGGGC 

2757 HGIE2, 

ValGlySerGlnLeuProCysGluProGluProAspValAlaValLeuThrSerMetLeu 
27 62 GTAGGGTCGCAATTACCTTGCGAGCCCGAACCGGACGTGGCCGTGTTGACGTCCATGCTC 
CATCCCAGCGTTAATGGAACGCTCGGGCTTGGCCTGCACCGGCACAACTGCAGGTACGAG 

2809 AAT2, 

ThrAspProSerHisIleThrAlaGluAlaAlaGlyArgArgLeuAlaArgGlySerPro 
2322 ACTGATCCCTCCCATATAACAGCAGAGGCGGCCGGGCGAAGGTTGGCGAGGGGATCACCC 
TGACTAGGGAGGGTATATTGTCGTCTCCGCCGGCCCGCTTCCAACCGCTCCCCTAGTGGG 

2850 EAGl XMA3, 

ProSerValAlaSerSerSerAlaSerGlnLeuSerAlaProSerLeuLysAlalhrCys 
2832 CCCTCTGTGGCCAGCTCCTCGGCTAGCCAGCTATCCGCTCCATCTCTCAAGGCAACTTGC 
GGGAGACACCGGTCGAGGAGCCGATCGGTCGATAGGCGAGGTAGAGAGTTCCGTTGAACG 

A A 

2889 BALI, 2903 NHEI, 

ThrAlaAsnHisAspSerProAspAiaGluLeuIleGluAlaAsnLeuLeuTrpArgGln 
294 2 ACCGCTAACCATGACTCCCCTGATGCTGAGCTCATAG AGGCCAACCTCCTATGG AGGCAG 
TGGCGATTGGTACTGAGGGGACTACGACTCGAGTATCTCCGGTTGGAGGATACCTCCGTC 

A A 

2966 ESPl, 2969 SACI, 

GluMetGlyGlyAsnlleThrArgValGluSerGluAsnLysValVallleLeuAspSer 
3002 GAGATGGGCGGCAACATCACCAGGGTTGAGTCAGAAAACAAAGTGGTGATTCTGGACTCC 
CTCTACCCGCCGTTGTAGTGGTCCCAACTCAGTCTTTTGTTTCACCACTAAGACCTGAGG 

PheAspProLeuValAlaGluGluAspGluArgGluIleSerVaiProAlaGluIleLeu 
3062 TTCGATCCGCTTGTGGCGGAGGAGGACGAGCGGGAGATCTCCGTACCCGCAGAAATCCTG 
AAGCTAGGCGAACACCGCCTCCTCCTGCTCGCCCTCTAGAGGCATGGGCGTCTTTAGGAC 

A 

3096 BGL2, 

ArgLysSerArgArgPheAlaGlnAlaLeuProValTrpAlaArgProAspTyrAsnPro 
3122 CGGAAGTCTCGGAGATTCGCCCAGGCCCTGCCCGTTTGGGCGCGGCCGGACTATAACCCC 
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GCCTTCAGAGCCTCTAAGCGGGTCCGGGACGGGCAAACCCGCGCCGGCCTGATATTGGGG 
3143 ALWNl, 3164 EAGl XMA3, 

; ProLeuValGluThrTrpLysLysProAspTyrGluProProValVaiHisGlyCysPro 
3182 CCGCTAGTGGAGACGTGGAAAAAGCCCGACTACGAACCACCTGTGGTCCATGGCTGCCCG 
GGCGATCACCTCTGCACCTTTTTCGGGCTGATGCTTGGTGGACACCAGGTACCGACGGGC 

A A 

3217 HGIE2, 3229 NCOI, 

LeuProProProLysSerProProValProProProArgLysLysArgThrValValLeu 
324 2 CTTCCACCTCCAAAGTCCCCTCCTGTGCCTCCGCCTCGGAAGAAGCGGACGGTGGTCCTC 
GAAGGTGGAGGTTTCAGGGGAGGACACGGAGGCGGAGCCTTCTTCGCCTGCGACCAGGAG 

ThrGluSerThrLeuSerThrAiaLeuAlaGluLeuAlaThrArgSerPheGiySerSer 
3302 ACTGAATCAACCCT ATCT ACTGCCTTGGCCGAGCTCGCCACCAGAAGCTTTGGCAGCTCC 
TGACTTAGTTGGGATAGATGACGGAACCGGCTCGAGCGGTGGTCTTCGAAACCGTCGAGG 

A /V 

3332 SACI, 3346 HIND3, 

SerThrSerGlylleThrGiyAspAsnThrThrThrSerSerGluProAlaProSerGiy 
3362 TCAACTTCCGGCATTACGGGCGACAATACGACAACATCCTCTGAGCCCGCCCCTTCTGGC 
AGTTGAAGGCCGTAATGCCCGCTGTTATGCTGTTGTAGGAGACTCGGGCGGGGAAGACCG 

CysProProAspSerAspAlaGluSerTyrSerSerMetProProLeuGiuGlyGluPro 
34 22 TGCCCCCCCGACTCCGACGCTGAGTCCTATTCCTCCATGCCCCCCCTGGAGGGGGAGCCT 
ACGGGGGGGCTGAGGCTGCGACTCAGGATAAGGAGGTACGGGGGGGACCTCCCCCTCGGA 

3437 EAM11051, 

GlyAspProAspLeuSerAspGlySerTrpSerThrValSerSerGluAlaAsnAiaGiu 
34 82 GGGGATCCGGATCTTAGCGACGGGTCATGGTCAACGGTCAGTAGTGAGGCCAACGCGGAG 
CCCCTAGGCCTAGAATCGCTGCCCAGTACCAGTTGCCAGTCATCACTCCGGTTGCGCCTC 

A A A 

3484 BAMHI, 3485 BSABl, 3487 BSPEl, 

AspValValCysCysSerMetSerTyrSerTrpThrGlyAlaLeuValThrProCysAla 
3542 GATGTCGTGTGCTGCTCAATGTCTTACTCTTGGACAGGCGCACTCGTCACCCCGTGCGCC 
CTACAGCACACGACGAGTTACAGAATGAGAACCTGTCCGCGTGAGCAGTGGGGCACGCGG. 

A A 

3589 DRA3, 3600 SAC2, 

AlaGluGluGlnLysLeuProIleAsnAlaLeuSerAsnSerLeuLeuArgHisHisAsn 
3602 GCGGAAGAACAGAAACTGCCCATCAATGCACTAAGCAACTCGTTGCTACGTCACCACAAT 
CGCCTTCTTGTCTTTGACGGGTAGTTACGTGATTCGTTGAGCAACGATGCAGTGGTGTTA 

^ A 

3611 ALWNl, 3655 PFLMl, 

LeuValTyrSerThrThrSerArgSerAlaCysGlnArgGlnLysLysValThrPheAsp 
3662 TTGGTGTATTCCACCACCTCACGCAGTGCTTGCCAAAGGCAGAAGAAAGTCACATTTGAC 
AACCACATAAGGTGGTGGAGTGCGTCACGAACGGTTTCCGTCTTCTTTCAGTGTAAACTG 

A 

3681 DRA3, 

ArgLeuGlnValLeuAspSerHisTyrGlnAspValLeuLysGluValLysAlaAlaAla 
3722 AGACTGCAAGTTCTGGACAGCCATTACCAGGACGTACTCAAGGAGGTTAAAGCAGCGGCG 
TCTGACGTTCAAGACCTGTCGGTAATGGTCCTGCATGAGTTCCTCCAATTTCGTCGCCGC 
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SerLysValLysAlaAsnLeuLeuSerValGluGluAlaCysSerLeuThrProProHis 
3-7 82 TCAAAAGTGAAGGCTAACTTGCTATCCGTAGAGGAAGC7TGCAGCCTGACGCCCCCACAC 
AGTTTTCACTTCCGATTGAACGATAGGCATCTCCTTCGAACGTCGGACTGCGGGGGTGTG 

3816HIND3, 

SerAlaLysSerLysPheGlyTyrGlyAiaLysAspValArgCysHisAlaArgLysAla 
384 2 TCAGCCAAATCCAAGTTTGGTTATGGGGCAAAAGACGTCCGTTGCCATGCC AG A2UVGGCC 
AGTCGGTTTAGGTTCAAACCAATACCCCGTTTTCTGCAGGCAACGGTACGGTCTTTCCGG 

3875 AAT2, 3890 BGLI, 

ValThrHisIieAsnSerValTrpLysAspLeuLeuGluAspAsnValThrProIleAsp 
3902 GTAACCCACATCAACTCCGTGTGGAAAGACCTTCTGGAAGACAATGTAACACCAATAGAC 
CATTGGGTGTAGTTGAGGCACACCTTTCTGGAAGACCTTCTGTTACATTGTGGTTATCTG 

ThrThrlleMetAlaLysAsnGluValPheCysValGlnProGluLysGlyGiyArgLys 
3962 ACTACCATCATGGCTAAGAACGAGGTTTTCTGCGTTCAGCCTGAGAAGGGGGGTCGTAAG 
TGA7GGTAGTACCGATTCTTGCTCCAAAAGACGCAAGTCGGACTCTTCCCCCCAGCATTC 

ProAiaArgLeuIleValPheProAspLeuGlyValArgValCysGluLysMetAlaLeu 
4 022 CCAGCTCGTCTCATCGTGTTCCCCGATCTGGGCGTGCGCGTGTGCGAAAAGATGGCTTTG 
GGTCGAGCAGAGTAGCACAAGGGGCTAGACCCGCACGCGCACACGCTTTTCTACCGAAAC 

TyrAspValValThrLysLeuProLeuAlaValMetGlySerSerTyrGlyPheGlnTyr 
4082 TACGACGTGGTTACAAAGCTCCCCTTGGCCGTGATGGGAAGCTCCTACGGATTCCAATAC 
ATGCTGCACCAATGTTTCGAGGGGAACCGGCACTACCCTTCGAGGATGCCTAAGGTTATG 

SerProGlyGlnArgValGluPheLeuValGlnAlaTrpLysSerLysLysThrProMet 
414 2 TCACCAGGACAGCGGGTTGAATTCCTCGTGCAAGCGTGGAAGTCCAAGAAAACCCCAATG 
AGTGGTCCTGTCGCCCAACTTAAGGAGCACGTTCGCACCTTCAGGTTCTTTTGGGGTTAC 

4160 ECORI, 

GXyPheSerTyrAspThrArgCysPheAspSerThrValThrGluSerAspIleArgThr 
4 202 GGGTTCTCGTATGATACCCGCTGCTTTGACTCCACAGTCACTGAGAGCGACATCCGTACG 
CCCAAGAGCATACTATGGGCGACGAAACTGAGGTGTCAGTGACTCTCGCTGTAGGCATGC 

4229 DRDl, 4236 ALWNl, 

GluGluAlalleTyrGlnCysCysAspLeuAspProGlnAlaArgValAlalleLysSer 
, * 4 262 GAGGAGGCAATCTACCAATGTTGTGACCTCGACCCCCAAGCCCGCGTGGCCATCAAGTCC 
CTCCTCCGTTAGATGGTTACAACACTGGAGCTGGGGGTTCGGGCGCACCGGTAGTTCAGG 

4 301 BGLI, 4 308 BALI, 

LeuThrGluArgLeuTyrValGlyGlyProLeuThrAsnSerArgGlyGluAsnCysGly 
4 322 CTCACCGAGAGGCTTTATGTTGGGGGCCCTCTTACCAATTCAAGGGGGGAGAACTGCGGC 
GAGTGGCTCTCCGAAATACAACCCCCGGGAGAATGGTTAAGTTCCCCCCTCTTGACGCCG 

4345 APAI, 

TyrArgArgCysArgAlaSerGlyValLeuThrThrSerCysGlyAsnThrLeuThrCys 
4 382 TATCGCAGGTGCCGCGCGAGCGGCGTACTGACAACTAGCTGTGGTAACACCCTCACTTGC 
ATAGCGTCCACGGCGCGCTCGCCGCATGACTGTTGATCGACACCATTGTGGGAGTGAACG 
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TyrlieLysAlaArgAiaAiaCysArgAlaAlaGiyLeuGlnAspCvsThrMetLeuVal 
44 42 TACATCAAGGCCCGGGCAGCCTGTCGAGCCGCAGGGCTCCAGGACTGCACCATGCTCGTG 
ATGTAGTTCCGGGCCCGTCGGACAGCTCGGCGTCCCGAGGTCCTGACGTGGTACGAGCAC 

4452 SMAI XMAI, 

CysGlyAspAspLeuValVallleCysGluSerAlaGlyValGlnGiuAsDAlaAlaSer 
4 502 TGTGGCGACGACTTAGTCGTTATCTGTGAAAGCGCGGGGGTCCAGGAGGACGCGGCGAGC 
ACACCGCTGCTGAATCAGCAATAGACACTTTCGCGCCCCCAGGTCCTCCTGCGCCGCTCG 

4508 DRDi, 4511 TTH3I, 

LeuArgAlaPheThrGluAlaMetThrArgTyrSerAlaProProGlyAspProProGin 
4 562 CTGAGAGCCTTCACGGAGGCTATGACCAGGTACTCCGCCCCCCCTGGGGACCCCCCACAA 
GACTCTCGGAAGTGCCTCCGATACTGGTCCATGAGGCGGGGGGGACCCCTGGGGGGTGTT 

ProGluTyrAspLeuGluLeuIieThrSerCysSerSerAsnValSerValAlaHisAsp 
4 622 CCAGAATACGACTTGGAGCTCATAACATCATGCTCCTCCAACGTGTCAGTCGCCCACGAC 
GGTCTTATGCTGAACCTCGAGTATTGTAGTACGAGGAGGTTGCACAGTCAGCGGGTGCTG 

4 637 SACI, 

GlyAlaGlyLysArgValTyrTyrLeuThrArgAspProThrThrProLeuAlaArgAla 
4 682 GGCGCTGGAAAGAGGGTCTACTACCTCACCCGTGACCCTACAACCCCCCTCGCGAGAGCT 
CCGCGACCTTTCTCCCAGATGATGGAGTGGGCACTGGGATGTTGGGGGGAGCGCTCTCGA 

4731 NRUI, 

AlaTrpGluThrAlaArgHisThrProValAsnSerTrpLeuGlyAsnllelleMetPhe 
4 742 GCGTGGGAGACAGCAAGACACACTCCAGTCAATTCCTGGCTAGGCAACATAATCATGTTT 
CGCACCCTCTGTCGTTCTGTGTGAGGTCAGTTAAGGACCGATCCGTTGTATTAGTACAAA 

AlaProThrLeuTrpAlaArgHetlleLeuMetThrHisPhePheSerValLeuIleAla 
4 802 GCCCCCACACTGTGGGCGAGGATGATAC7GATGACCCATTTCTTTAGCGTCCTTATAGCC 
CGGGGGTGTGACACCCGCTCCTACTATGACTACTGGGTAAAGAAATCGCAGGAATATCGG 

4806 PFLMl, 4807 DRA3, 

ArgAspGlnLeuGluGlnAlaLeuAspCysGluIieTyrGiyAlaCysTyrSerlleGlu 
4 862 AGGGACCAGCTTGAACAGGCCCTCGATTGCGAGATCTACGGGGCCTGCTACTCCATAGAA 
TCCCTGGTCGAACTTGTCCGGGAGCTAACGCTCTAGATGCCCCGGACGATGAGGTATCTT 

4893 BGL2, 

ProLeuAspLeuProProIlelleGlnArgLeuHisGlyLeuSerAlaPheSerLeuHis 
4 922 CCACTGGATCTACCTCCAATCATTCAAAGACTCCATGGCC7CAGCGCATTTTCACTCCAC 
GGTGACCTAGATGGAGGTTAGTAAGTTTCTGAGGTACCGGAGTCGCGTAAAAGTGAGGTG 

4954 NCOI, 

SerTyrSerProGlyGluIleAsnArgValAlaAlaCysLeuArgLysLeuGlyValPro 
4 982 AGTTACTCTCCAGGTGAAATCAATAGGGTGGCCGCATGCCTCAGAAAACTTGGGGTACCG 
TCAATGAGAGGTCCACTTTAGTTATCCCACCGGCGTACGGAGTCTTTTGAACCCCATGGC 

5015 SPHI, 5035 KPNI, 

ProLeuArgAlaTrpArgHisArgAlaArgSerValArgAlaArgLeuLeuAlaArgGly 
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5042 CCCTTGCGAGCTTGGAGACACCGGGCCCGGAGCGTCCGCGCTAGGCTTCTGGCCAGAGGA 
GGGAACGCTCGAACCTCTGTGGCCCGGGCCTCGCAGGCGCGATCCGAAGACCG3TCrCCT 

5064 APAI, 5091 BALI, 

GlyArgAlaAlalleCysGlyLysTyrLeuPheAsnTrpAlaValArgThrLysLeuLys 
5102 GGCAGGGCTGCCATATGTGGCAAGTACCTCTTCAACTGGGCAGTAAGAACAAAGCTCAAA 
CCGTCCCGACGGTATACACCGTTCATGGAGAAGTTGACCCGTCATTCTTGTTTCGAGTTT 

5113 NDEI, 

LeuThrProIleAlaAlaAlaGlyGlnLeuAspLeuSerGlyTroPheThrAlaGlvTvr 
5162 CTCACTCCAAT AGCGGCCGCTGGCCAGCTGGACTTGTCCGGCTGGTTCACGGCTGGCTAC 
GAGTGAGGTTATCGCCGGCGACCGGTCGACCTGAACAGGCCGACCAAGTGCCGACCGATG 

A A A A 

5174 NOTI, 5175 EAGl XMA3, 5182 BALI, 5186 PVU2, 

SerGlyGlyAspIleTyrHisSerValSerHisAlaArgProArgTrpIleTrpPheCys 
5222 AGCGGGGGAGACATTTATCACAGCGTGTCTCATGCCCGGCCCCGCTGGATCTGGTTTTGC 
TCGCCCCCTCTGTAAATAGTGTCGCACAGAGTACGGGCCGGGGCGACCTAGACCAAAACG 

5240 DRAB, 

LeuLeuLeuLeuAlaAlaGlyValGlylleTyrLeuLeuProAsnArgOP 
5282 CT ACTCCTGCTTGCTGCAGGGGTAGGC ATCTACCTCCTCCCCAACCGATGAATAGTCG AC 
GATGAGGACGAACGACGTCCCCATCCGTAGATGGAGGAGGGGTTGGCTACTTATCAGCTG 

5295 PSTI, 5336 SALI, 
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MetAlaAlaTyrAlaAlaGlnGlyTyrLysValLeuValLeuAsn 
2 AGCTTACAAAACAAAATGGCTGCATATGCAGCTCAGGGCTATAAGGTGCTAGTACTCAAC 
TCGAATGTTTTGTTTTACCGACGTATACGTCGAGTCCCGATATTCCACGATCATGAGTTG 

1 HIND3, 24 NDEI, 52 SCAI, 

ProSerValAlaAlaThrLeuGiyPheGlyAlaTyrMetSerLysAlaHisGlylleAsp 
62 CCCTCTGTTGCTGCAACACTGGGCTTTGGTGCTTACATGTCCAAGGCTCATGGGATCGAT 
GGGAGACAACGACGTTGTGACCCGAAACCACGAATGTACAGGTTCCGAGTACCCTAGC7A 

116 CLAI, 

ProAsnlleArgThrGlyValArgThrlleThrThrGlySerProIleThrTyrSerThr 
122 CCTAACATCAGGACCGGGGTGAGAACAATTACCACTGGCAGCCCCATCACGT ACTCCACC 
GGATTGTAGTCCTGGCCCCACTCTTGTTAATGGTGACCGTCGGGGTAGTGCATGAGGTGG 

TyrGlyLysPheLeuAlaAspGlyGlyCysSerGlyGlyAiaTyrAspIlellelleCys 
182 T ACGGCAAGTTCCTTGCCGACGGCGGGTGCTCGGGGGGCGCTTATGACATAAT AATTTGT 
ATGCCGTTCAAGGAACGGCTGCCGCCCACGAGCCCCCCGCGAATACTGTATTATTAAACA 

AspGluCysHisSerThrAspAlaThrSerlleLeuGlylleGlyThrValLeuAspGln 
242 GACGAGTGCCACTCCACGGATGCCACATCCATCTTGGGCATTGGCACTGTCCTTGACCAA 
CTGCTCACGGTGAGGTGCCTACGGTGTAGGTAGAACCCGTAACCGTGACAGGAACTGGTT 

AlaGluThrAlaGlyAlaArgLeuValValLeuAlaThrAlaThrProProGlySerVal 
302 GCAGAGACTGCGGGGGCGAGACTGGTTGTGCTCGCCACCGCCACCCCTCCGGGCTCCGTC 
CGTCTCTGACGCCCCCGCTCTGACCAACACGAGCGGTGGCGGTGGGGAGGCCCGAGGCAG 

A 

303 ALWNl, ' 

ThrValProHisProAsnlleGluGluValAlaLeuSerThrThrGlyGluIleProPhe 
362 ACTGTGCCCCATCCCAACATCGAGGAGGTTGCTCTGTCCACCACCGGAGAGATCCCTTTT 
TGACACGGGGTAGGGTTGTAGCTCCTCCAACGAGACAGGTGGTGGCCTCTCTAGGGAAAA 

TyrGlyLysAlalleProLeuGluVallleLysGlyGlyArgHisLeuIlePheCysHis 
422 TACGGCAAGGCTATCCCCCTCGAAGTAATCAAGGGGGGGAGACATCTCATCTTCTGTCAT 
ATGCCGTTCCGATAGGGGGAGCTTCATTAGTTCCCCCCCTCTGTAGAGTAGAAGACAGTA 
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SerLysLysLysCysAspGluLeuAlaAlaLysLeuValAlaLeuGlylleAsnAIaVai 
462 TCAAAGAAGAAGTGCGACGAACTCGCCGCAAAGCTGGTCGCATTGGGCATCAATGCCGTG 
* AGTTTCTTCTTCACGCTGCTTGAGCGGCGTTTCGACCAGCGTAACCCGTAGTTACGGCAC 

AlaTyrTyrArgGlyLeuAspValSerVallleProThrSerGlyAspValValValVal 
5 4 2 GCCTACTACCGCGGTCTTGACGTGTCCGTCATCCCGACCAGCGGCGATGTTGTCGTCGTG 
CGGATGATGGCGCCAGAACTGCACAGGCAGTAGGGCTGGTCGCCGCTACAACAGCAGCAC 

550 SAC2, 560 DRDl, 

AlaThrAspAlaLeuMetThrGlyTyrThrGlyAspPheAspSerVailleAspCysAsn 
602 GCAACCGATGCCCTCATGACCGGCTATACCGGCGACTTCGACTCGGTGATAGACTGCAAT 
CGT-TGGCTACGGGAGTACTGGCCGATATGGCCGCTGAAGCTGAGCCACTATCTGACGTTA 

615 BSPHl, 

ThrCysValThrGlnThrValAspPheSerLeuAspProThrPheThrlleGluThrlie 
662 ACGTGTGTCACCCAGACAGTCGATTTCAGCCTTGACCCTACCTTCACCATTGAGACAATC 
TGCACACAGTGGGTCTGTCAGCTAAAGTCGGAACTGGGATGGAAGTGGTAACTCTGTTAG 

ThrLeuProGlnAspAlaValSerArgThrGlnArgArgGiyArgThrGlyArgGlyLys 
722 ACGCTCCCCCAAGATGCTGTCTCCCGCACTCAACGTCGGGGCAGGACTGGCAGGGGGAAG 
TGCGAGGGGGTTCTACGACAGAGGGCGTGAGTTGCAGCCCCGTCCTGACCGTCCCCCTTC 

ProGlylleTyrArgPheValAlaProGlyGluArgProSerGlyMetPheAspSerSer 
782 CCAGGCATCTACAGATTTGTGGCACCGGGGGAGCGCCCCTCCGGCATGTTCGACTCGTCC 
GGTCCGTAGATGTCTAAACACCGTGGCCCCCTCGCGGGGAGGCCGTACAAGCTGAGCAGG 

816 BGLI, 833 DRDl, 

ValLeuCysGluCysTyrAspAiaGlyCysAlaTrpTyrGluLeuThrProAlaGluThr 
842 GTCCTCTGTGAGTGCTATGACGCAGGCTGTGCTTGGTATGAGCTCACGCCCGCCGAGACT 
CAGGAGACACTCACGATACTGCGTCCGACACGAACCATACTCGAGTGCGGGCGGCTCTGA 

881 SACI, 

ThrValArgLeuArgAlaTyrMetAsnThrProGlyLeuProValCysGlnAspHisLeu 
902 ACAGTTAGGCTACGAGCGTACATGAACACCCCGGGGCTTCCCGTGTGCCAGGACCATCTT 
TGTCAATCCGATGCTCGCATGTACTTGTGGGGCCCCGAAGGGCACACGGTCCTGGTAGAA 

931 SMAI XMAI, 

GluPheTrpGluGlyValPheThrGlyLeuThrHisIleAspAlaHisPheLeuSerGln 
962 GAATTTTGGGAGGGCGTCTTTACAGGCCTCACTCATATAGATGCCCACTTTCTATCCCAG 
CTTAAAACCCTCCCGCAGAAATGTCCGGAGTGAGTATATCTACGGGTGAAAGATAGGGTC 

985 STUI, 

ThrLysGlnSerGlyGluAsnLeuProTyrLeuValAlaTyrGlnAiaThrValCysAia 
1022 ACAAAGCAGAGTGGGGAGAACCTTCCTTACCTGGTAGCGTACCAAGCCACCGTGTGCGCT 
TGTTTCGTCTCACCCCTCTTGGAAGGAATGGACCATCGCATGGTTCGGTGGCACACGCGA 

1069 DRA3, 

ArgAlaGlnAlaProProProSerTrpAspGlnMetTrpLysCysLeuIIeArgLeuLys 
1082 AGGGCTCAAGCCCCTCCCCCATCGTGGGACCAGATGTGGAAGTGTTTGATTCGCCTCAAG 



58/100 



wo 01/38360 



PCT/USOO/32326 



FIGURE 17. Page 5 



TCCCGAGTTCGGGGAGGGGGTAGCACCCTGGTCTACACCTTCACAAACTAAGCGGAGTTC 

ProThrLeuHisGlyProThrProLeuLeuTyrArgLeuGlyAlaValGlnAsnGluIle 
1142 CCCACCCTCCATGGGCCAACACCCCTGCTATACAGACTGGGCGCTGTTCAGAATGAAATC 
. GGGTGGGAGGTACCCGGTTGTGGGGACG AT ATGTCTGACCCGCGACAAGTCTTACTTT AG 

-A 

1150 NCOI, 

ThrLeuThrHisProValThrLysTyrlleMetThrCysMetSerAlaAspLeuGluVal 
1202 ACCCTGACGCACCCAGTCACCAAATACATCATGACATGCATGTCGGCCGACCTGGAGGTC 
TGGGACTGCGTGGGTCAGTGGTTTATGTAGTACTGTACGTACAGCCGGCTGGACCTCCAG 

1230 BSPHl, 1234 DRDl, 1237 AVA3, 1245 EAGl XMA3, 1250 DRDl, 



ValThrSerThrTrpValLeuValGiyGlyValLeuAlaAlaLeuAlaAlaTyrCysLeu 
1262 GTCACGAGCACCTGGGTGCTCGTTGGCGGCGTCCTGGCTGCTTTGGCCGCGTATTGCCTG 
CAGTGCTCGTGGACCCACGAGCAACCGCCGCAGGACCGACGAAACCGGCGCATAACGGAC 

SerThrGlyCysValVallleValGlyArgValValLeuSerGlyLysProAlallelle 
1322 TCAACAGGCTGCGTGGTCAT AGTGGGCAGGGTCGTCTTGTCCGGGAAGCCGGCAATCATA 
AGTTGTCCGACGCACCAGTATCACCCGTCCCAGCAGAACAGGCCCTTCGGCCGTTAGTAT 

1369 NAEI, 

ProAspArgGluValLeuTyrArgGluPheAspGluMetGluGluCysSerGlnHisLeu 
1382 CCTGAC AGGGAAGTCCTCTACCGAGAGTTCGATGAGATGGAAGAGTGCTCTCAGCACTTA 
GGACTGTCCCTTCAGGAGATGGCTCTCAAGCTACTCTACCTTCTCACGAGAGTCGTGAAT 

1385 DRDl, 

ProTyrlleGiuGlnGlyMetMetLeuAlaGluGlnPheLysGlnLysAlaLeuGlyLeu 
14 4 2 CCGTACATCGAGCAAGGGATGATGCTCGCCGAGCAGTTCAAGCAGAAGGCCCTCGGCCTC 
GGCATGTAGCTCGTTCCCTACTACGAGCGGCTCGTCAAGTTCGTCTTCCGGGAGCCGGAG 

LeuGlnThrAlaSerArgGlnAlaGluVallleAlaProAlaValGlnThrAsnTrpGln 
1502 CTGCAGACCGCGTCCCGTCAGGCAGAGGTTATCGCCCCTGCTGTCCAGACCAACTGGCAA 
GACGTCTGGCGCAGGGCAGTCCGTCTCCAATAGCGGGGACGACAGGTCTGGTTGACCGTT 

1502 PSTI, 1507 TTH3I/ 

LysLeuGluThrPheTrpAlaLysHisMetTrpAsnPhelieSerGlylleGlnTyrLeu 
^ 1562 AAACTCGAGACCTTCTGGGCGAAGCATATGTGGAACTTCATCAGTGGGATACAATACTTG 
TTTGAGCTCTGGAAGACCCGCTTCGTATACACCTTGAAGTAGTCACCCTATGTTATGAAC 

1565 XHOI, 1586 NDEI, 

AlaGlyLeuSerThrLeuProGlyAsnProAlalleAlaSerLeuMetAlaPheThrAla 
1622 GCGGGCTTGTCAACGCTGCCTGGTAACCCCGCCATTGCTTCATTGATGGCTTTTACAGCT 
CGCCCGAACAGTTGCGACGGACCATTGGGGCGGTAACGAAGTAACTACCGAAAATGTCGA 

164 3 BSTE2, 1677 ALWNl PVU2, 

AlaValThrSerProLeuThrThrSerGlnThrLeuLeuPheAsnlleLeuGlyGlyTrp 
1682 GCTGTCACCAGCCCACTAACCACTAGCCAAACCCTCCTCTTCAACATATTGGGGGGGTGG 
CGACAGTGGTCGGGTGATTGGTGATCGGTTTGGGAGGAGAAGTTGTATAACCCCCCCACC 
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ValAlaAlaGlnL uAlaAlaProGlyAlaAIaThrAlaPheValGlyAlaGlyLeuAia 
1742 GTGGCTGCCCAGCTCGCCGCCCCCGGTGCCGCTACTGCCTTTGTGGGCGCTGGCTTAGCT 
CACCGACGGGTCGAGCGGCGGGGGCCACGGCGATGACGGAAACACCCGCGACCGAATCGA 

1794 ESPl, 

GlyAlaAlalleGlySerValGlyLeuGiyLysValLeuIleAspIleLeuAiaGlyTyr 
1802 GGCGCCGCCATCGGCAGTGTTGGACTGGGGAAGGTCCTCATAGACATCCTTGC AGGGT AT 
CCGCGGCGGTAGCCGTCACAACCTGACCCCTTCCAGGAGTATCTGTAGGAACGTCCCATA 

1802 KASl NARI, 

GlyAlaGlyValAlaGlyAlaLeuValAlaPheLysIleMetSerGlyGluValProSer 
1862 GGCGCGGGCGTGGCGGGAGCTCTTGTGGCATTCAAGATCATGAGCGGTGAGGTCCCCTCC 
CCGCGCCCGCACCGCCCTCGAGAACACCGTAAGTTCTAGTACTCGCCACTCCAGGGGAGG 

1878 SACI, 1899 BSPHl, 

ThrGluAspLeuValAsnLeuLeuProAlalleLeuSerProGlyAlaLeuVaiValGly 
1922 ACGGAGGACCTGGTCAATCTACTGCCCGCCATCCTCTCGCCCGGAGCCCTCGTAGTCGGC 
TGCCTCCTGGACCAGTTAGATGACGGGCGGTAGGAGAGCGGGCCTCGGGAGCATCAGCCG 

1928 TTH3I, 

ValValCysAlaAlalleLeuArgArgHisValGlyProGlyGluGlyAlaValGlnTrp 
1982 GTGGTCTGTGCAGCAAT ACTGCGCCGGCACGTTGGCCCGGGCGAGGGGGCAGTGCAGTGG 
CACCAGACACGTCGTTATGACGCGGCCGTGCAACCGGGCCCGCTCCCCCGTCACGTCACC 

2004 NAEI, 2017 SMAI XMAI, 

MeTiAsnArgLeuIleAlaPheAlaSerArgGlyAsnHisValSerProThrHisTyrVal 
204 2 ATGAACCGGCTGATAGCCTTCGCCTCCCGGGGGAACCATGTTTCCCCCACGCACTACGTG 
TACTTGGCCGACTATCGGAAGCGGAGGGCCCCCTTGGTACAAAGGGGGTGCGTGATGCAC 

A A 

2067 SMAI XMAI, 2093 DRA3, 

ProGiuSerAspAlaAiaAlaArgValThrAlalleLeuSerSerLeuThrValThrGin 
2102 CCGGAGAGCGATGCAGCTGCCCGCGTCACTGCCATACTCAGCAGCCTCACTGTAACCCAG 
GGCCTCTCGCTACGTCGACGGGCGCAGTGACGGTATGAGTCGTCGGAGTGACATTGGGTC 

2115 PVU2, 2159 ALWNl, 

LeuLeuArgArgLeuHisGlnTrpIleSerSerGluCysThrThrProCysSerGlySer 
2162 CTCCTGAGGCGACTGCACCAGTGGATAAGCTCGGAGTGTACCACTCCATGCTCCGGTTCC 
GAGGACTCCGCTGACGTGGTCACCTATTCGAGCCTCACATGGTGAGGTACGAGGCCAAGG 

2164 MST2, 2220 ECONl, 

TrpLeuArgAspIleTrpAspTrpIleCysGluValLeuSerAspPheLysThrTrpLeu 
2222 TGGCTAAGGGACATCTGGGACTGGATATGCGAGGTGTTGAGCGACTTTAAGACCTGGCTA 
ACCGATTCCCTGTAGACCCTGACCTATACGCTCCACAACTCGCTGAAATTCTGGACCGAT 

LysAlaLysLeuMetProGlnLeuProGlylleProPheValSerCysGlnArgGlyTyr 
2282 AAAGCTAAGCTCATGCCACAGCTGCCTGGGATCCCCTTTGTGTCCTGCCAGCGCGGGTAT 
TTTCGATTCGAGTACGGTGTCGACGGACCCTAGGGGAAACACAGGACGGTCGCGCCCATA 

^ A A 

2285 ESPl, 2300 PVU2, 2310 BAMHI, 
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LysGlyValTrpArgGlyAspGlyll MetKisThrArgCysHisCysGlyAlaGiuIie 
2342 AAGGGGGTCTGGCGAGGGGAtGGCATCATGCACACTCGCTGCCACTGTGGAGCTGAGATC 
TTCCCCCAGACCGCTCCCCTGCCGTAGTACGTGTGAGCGACGGTGACACCTCGACTC7AG 

ThrGlyHisValLysAsnGlyThrMetArglleValGlyProArgThrCysArgAsnMet 
2402 ACTGGACATGTCAAAAACGGGACGATGAGGATCGTCGGTCCTAGGACCTGCAGGAACATG 
TGACCTGTACAGTTTTTGCCCTGCTACTCCTAGCAGCCAGGATCCTGGACGTCCTTGTAC 

A AAA 

2425 BSABl, 2441 AVR2, 2448 SSE83871, 2449 PSTI, 

TrpSerGlyThrPheProIleAsnAlaTyrThrThrGlyProCysThrProLeuProAla 
24 62 TGGAGTGGGACCTTCCCCATTAATGCCTACACCACGGGCCCCTGTACCCCCCTTCCTGCG 
ACCTCACCCTGGAAGGGGTAATTACGGATGTGGTGCCCGGGGACATGGGGGGAAGGACGC 

A A 

2480 ASEl, 2497 APAI, 

ProAsnTyrThrPheAlaLeuTrpArgValSerAlaGluGluTyrValGluIleArgGin 
2522 CCGAACTACACGTTCGCGCTATGGAGGGTGTCTGCAGAGGAATACGTGGAGATAAGGCAG 
GGCTTGATGTGCAAGCGCGATACCTCCCACAGACGTCTCCTTATGCACCTCTATTCCGTC 

2553 PSTI, 

ValGlyAspPheHisTyrValThrGlyMetThrThrAspAsnLeuLysCysProCysGln 
2582 GTGGGGGACTTCCACTACGTGACGGGTATGACTACTGACAATCTTAAATGCCCGTGCCAG 
CACCCCCTGAAGGTGATGCACTGCCCATACTGATGACTGTTAGAATTTACGGGCACGGTC 

A 

2594 DRAB, 

ValProSerProGluPhePheThrGluLeuAspGlyValArgLeuHisArgPheAlaPro 
264 2 GTCCCATCGCCCGAATTTTTCACAGAATTGGACGGGGTGCGCCTACATAGGTTTGCGCCC 
CAGGGTAGCGGGCTTAAAAAGTGTCTTAACCTGCCCCACGCGGATGTATCCAAACGCGGG 

ProCysLysProLeuLeuArgGluGluValSerPheArgValGlyLeuHisGluTyrPro 
2 7 C 2 CCCTGCAAGCCCTTGCTGCGGGAGGAGGT ATCATTCAGAGTAGGACTCCACGAAT ACCCG 
GGGACGTTCGGGAACGACGCCCTCCTCCATAGTAAGTCTCATCCTGAGGTGCTTATGGGC 

2757 HGIE2, 

ValGlySerGlnLeuProCysGluProGluProAspValAlaValLeuThrSerMetLeu 
2762 GTAGGGTCGCAATTACCTTGCGAGCCCGAACCGGACGTGGCCGTGTTGACGTCCATGCTC 
CATCCCAGCGTTAATGGAACGCTCGGGCTTGGCCTGCACCGGCACAACTGCAGGTACGAG 

A 

2809 AAT2, 

ThrAspProSerHisIleThrAlaGluAlaAlaGlyArgArgLeuAlaArgGlySerPro 
2822 ACTGATCCCTCCCATATAACAGCAGAGGCGGCCGGGCGAAGGTTGGCGAGGGGATCACCC 
TGACTAGGGAGGGTATATTGTCGTCTCCGCCGGCCCGCTTCCAACCGCTCCCCTAGTGGG 

A 

2850 EAGl XMA3, 

ProSerValAlaSerSerSerAlaSerGlnLeuSerAlaProSerLeuLysAlaThrCys 
2882 CCCTCTGTGGCCAGCTCCTCGGCTAGCCAGCTATCCGCTCCATCTCTCAAGGCAACTTGC 
GGGAGACACCGGTCGAGGAGCCGATCGGTCGATAGGCGAGGTAGAGAGTTCCGTTGAACG 

A A 

2889 BALI, 2903 NHEI, 
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ThrAlaAsnHisAspSerProAspAlaGluLeuIl GluAlaAsnLeuLeiiTroArgGir. 
294 2 ACCGCTAACCATGACTCCCCTGATGCTGAGCTCATAGAGGCCAACCTCCTATGGAGGCAG 
TGGCGATTGGTACTGAGGGGACTACGACTCGAGTATC7CCGGTTGGAGGATACCTCCGTC 

2966 ESPX, 2969 SACI, 

GruMetGlyGlyAsnlleThrArgValGluSerGluAsnLysValVailleLeuAspSer 
3002 GAGATGGGCGGCAACATCACCAGGGTTGAGTCAGAAAACAAAGTGGTGATTCTGGACTCC 
CTCTACCCGCCGTTGTAGTGGTCCCAACTCAGTCTTTTGTTTCACCACTAAGACCTGAGG 

PheAspProLeuValAlaGluGluAspGluArgGluIleSerValProAlaGiuIleLeu 
3062 TTCGATCCGCTTGTGGCGGAGGAGGACGAGCGGGAGATCTCCGTACCCGCAGAAATCCTG 
AAGCTAGGCGAACACCGCCTCCTCCTGCTCGCCCTCTAGAGGCATGGGCGTCTTTAGGAC 

3096 BGL2, 

ArgLysSerArgArgPheAlaGlnAlaLeuProValTrpAlaArgProAspTyrAsnPro 
3122 CGGAAGTCTCGGAGATTCGCCCAGGCCCTGCCCGTTTGGGCGCGGCCGGACTATAACCCC 
GCCTTCAGAGCCTCTAAGCGGGTCCGGGACGGGCAAACCCGCGCCGGCCTGATATTGGGG 

3143 ALWNl, 3164 EAGl XMA3, 

ProLeuValGluThrTrpLysLysProAspTyrGluProProValValHisGlyCysPro 
3182 CCGCTAGTGGAGACGTGGAAAAAGCCCGACTACGAACCACCTGTGGTCCATGGCTGCCCG 
GGCGATCACCTCTGCACCTTTTTCGGGCTGATGCTTGGTGGACACCAGGTACCGACGGGC 

3217 HGIE2, 3229 NCOI, 

LeuProProProLysSerProProVaiProProProArgLysLysArgThrValValLeu 
324 2 CTTCCACCTCCAAAGTCCCCTCCTGTGCCTCCGCCTCGGAAGAAGCGGACGGTGGTCCTC 
GAAGGTGGAGGTTTCAGGGGAGGACACGGAGGCGGAGCCTTCTTCGCCTGCCACCAGGAG 

ThrGluSerThrLeuSerThrAlaLeuAlaGluLeuAlaThrArgSerPheGlySerSer 

3302 actgaatcaaccctatctactgccttggccgagctcgccaccagaAgctttggcagctcc 
tgacttagttgggatagatgacggaaccggctcgagcggtggtctt'cgaaaccgtcgagg 

3332 saci, 3346 hind3, 

. SerThrSerGlylleThrGlyAspAsnThrThrThrSerSerGluProAlaProSerGly 
3362 TCAACTTCCGGCATTACGGGCGACAATACGACAACATCCTCTGAGCCCGCCCCTTCTGGC 
AGTTGAAGGCCGTAATGCCCGCTGTTATGCTGTTGTAGGAGACTCGGGCGGGGAAGACCG 

CysProProAspSerAspAlaGluSerTyrSerSerMetProProLeuGluGlyGluPro 
34 22 TGCCCCCCCGACTCCGACGCTGAGTCCTATTCCTCCATGCCCCCCCTGGAGGGGGAGCCT 
ACGGGGGGGCTGAGGCTGCGACTCAGGATAAGGAGGTACGGGGGGGACCTCCCCCTCGGA 

3437 EAM1X051, 

GlyAspProAspLeuSerAspGlySerTrpSerThrValSerSerGluAlaAsnAlaGlu 
34 82 GGGGATCCGGATCTTAGCGACGGGTCATGGTCAACGGTCAGTAGTGAGGCCAACGCGGAG 
CCCCTAGGCCTAGAATCGCTGCCCAGTACCAGTTGCCAGTCATCACTCCGGTTGCGCCTC 

3484 BAMHI, 3485 BSABl, 3487 BSPEl, 

AspValValCysCysSerMetSerTyrSerTrpThrGlyAlaLeuValThrProCysAla 
354 2 GATGTCGTGTGCTGCTCAATGTCTTACTCTTGGACAGGCGCACTCGTCACCCCGTGCGCC 
CTACAGCACACGACGAGTTACAGAATGAGAACCTGTCCGCGTGAGCAGTGGGGCACGCGG 
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3589 DRA3, 3600 SAC2, " 

' AlaGluGluGlnLysLeuProIleAsnAlaLeuSerAsnSerLeuLeuArgHisHisAsn 
3602 'GCGGAAGAACAGAAACTGCCCATCAATGCACTAAGCAACTCGTTGCT ACGTCACCACAAT 
CGCCTTCTTGTCTTTGACGGGTAGTTACGTGATTCGTTGAGCAACGATGCAGTGGTGTTA 

A A 

3611 ALWNl, 3655 PFLMl, 

LeuValTyrSerThrThrSerArgSerAlaCysGlnArgGlnLysLysValThrPheAsp 
3662 TTGGTGTATTCCACCACCTCACGCAGTGCTTGCCAAAGGCAGAAGAAAGTCACATTTGAC 
AACCACATAAGGTGGTGGAGTGCGTCACGAACGGTTTCCGTCTTCTTTCAGTGTAAACTG 

3681 DRA3, 

ArgLeuGlnValLeuAspSerHisTyrGlnAspValLeuLysGluValLysAlaAlaAla 
3722 AGACTGCAAGTTCTGGACAGCCATTACCAGGACGTACTCAAGGAGGTTAAAGCAGCGGCG 
TCTGACGTTCAAGACCTGTCGGTAATGGTCCTGCATGAGTTCCTCCAATTTCGTCGCCGC 

SerLysValLysAlaAsnLeuLeuSerValGluGluAlaCysSerLeuThrProProHis 
3782 TCAAAAGTGAAGGCTAACTTGCTATCCGTAGAGGAAGCTTGCAGCCTGACGCCCC.CACAC 
AGTTTTCACTTCCGATTGAACGATAGGCATCTCCTTCGAACGTCGGACTGCGGGGGTGTG 

3816 HIND3, 

SerAlaLysSerLysPheGlyTyrGlyAlaLysAspValArgCysHisAlaArgLysAla 
3842 TCAGCCAAATCCAAGTTTGGTTATGGGGCAAAAGACGTCCGTTGCCATGCCAGAAAGGCC 
AGTCGGTTTAGGTTCAAACCAATACCCCGTTTTCTGCAGGCAACGGTACGGTCTTTCCGG 

A A 

3875 AAT2, 3890 BGLI, 

ValThrHisIleAsnSerValTrpLysAspLeuLeuGluAspAsnValThrProIleAsp 

3 902 GTAACCCACATCAACTCCGTGTGGAAAGACCTTCTGGAAGACAATGTAACACCAATAGAC 

CATTGGGTGTAGTTGAGGCACACCTTTCTGGAAGACCTTCTGTTACATTGTGGTTATCTG 

ThrThrlleMetAlaLysAsnGluValPheCysValGlnProGluLysGlyGlyArgLys 
3962 ACTACCATCATGGCTAAGAACGAGGTTTTCTGCGTTCAGCCTGAGAAGGGGGGTCGT AAG 
TGATGGTAGTACCGATTCTTGCTCCAAAAGACGCAAGTCGGACTCTTCCCCCCAGCATTC 

ProAlaArgLeuIleValPheProAspLeuGlyValArgValCysGluLysMetAlaLeu 

4 022 CCAGCTCGTCTCATCGTGTTCCCCGATCTGGGCGTGCGCGTGTGCGAAAAGATGGCTTTG 

GGTCGAGCAGAGTAGCACAAGGGGCTAGACCCGCACGCGCACACGCTTTTCTACCGAAAC 

TyrAspValValThrLysLeuProLeuAlaValMetGlySerSerTyrGlyPheGlnTyr 
4082 TACGACGTGGTTACAAAGCTCCCCTTGGCCGTGATGGGAAGCTCCTACGGATTCCAATAC 
ATGCTGCACCAATGTTTCGAGGGGAACCGGCACTACCCTTCGAGGATGCCTAAGGTTATG 

SerProGlyGlnArgValGluPheLeuValGlnAlaTrpLysSerLysLysThrProMet 
414 2 TCACCAGGACAGCGGGTTGAATTCCTCGTGCAAGCGTGGAAGTCCAAGAAAACCCCAATG 
AGTGGTCCTGTCGCCCAACTTAAGGAGCACGTTCGCACCTTCAGGTTCTTTTGGGGTTAC 

A 

4160 ECORI, 

GlyPheSerTyrAspThrArgCysPheAspSerThrValThrGluSerAspIleArgThr 
4202 GGGTTCTCGTATGATACCCGCTGCTTTGACTCCACAGTCACTGAGAGCGACATCCGTACG 
CCCAAGAGCATACTATGGGCGACGAAACTGAGGTGTCAGTGACTCTCGCTGTAGGCATGC 
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4229 DRDl, 4236 ALWNl, 

GluGluAlalleTyrGlnCysCysAspLeuAspProGlnAlaArgValAlalieLysSer 
4262 GAGGAGGCAATCTACCAATGTTGTGACCTCGACCCCCAAGCCCGCGTGGCCATCAAGTCC 
CTCCTCCGTTAGATGGTTACAACACTGGAGCTGGGGGTTCGGGCGCACCGGTAGTTCAGG 

4 301 BGLI, 4308 BALI, 

LeuThrGluArgLeuTyrValGlyGlyProLeuThrAsnSerArgGlyGluAsnCysGly 
4 322 CTCACCGAGAGGCTTTATGTTGGGGGCCCTCTTACCAATTCAAGGGGGGAGAACTGCGGC 
GAGTGGCTCTCCGAAATACAACCCCCGGGAGAATGGTTAAGTTCCCCCCTCTTGACGCCG 

4 34 5 APAI, 

TyrArgArgCysArgAlaSerGlyValLeuThrThrSerCysGlyAsnThrLeuThrCys 
4 382 TATCGCAGGTGCCGCGCGAGCGGCGTACTGACAACTAGCTGTGGTAACACCCTCACTTGC 
ATAGCGTCCACGGCGCGCTCGCCGCATGACTGTTGATCGACACCATTGTGGGAGTGAACG 

TyrlleLysAlaArgAlaAlaCysArgAlaAlaGlyLeuGlnAspCysThrMetLeuVal 
4 442 TACATCAAGGCCCGGGCAGCCTGTCGAGCCGCAGGGCTCCAGGACTGCACCATGCTCGTG 
ATGTAGTTCCGGGCCCGTCGGACAGCTCGGCGTCCCGAGGTCCTGACGTGGTACGAGCAC 

4452 SMAI XMAI, 

CysGlyAspAspLeuValVallleCysGluSerAlaGlyValGlnGluAspAlaAlaSer 
4 502 TGTGGCGACGACTTAGTCGTTATCTGTGAAAGCGCGGGGGTCCAGGAGGACGCGGCGAGC 
ACACCGCTGCTGAATCAGCAATAGACACTTTCGCGCCCCCAGGTCCTCCTGCGCCGCTCG 

4508 DRDl, 4511 TTH3I, 

LeuArgAiaPheThrGluAlaMetThrArgTyrSerAlaProProGlyAspProProGln 
4 562 CTGAGAGCCTTCACGGAGGCTATGACCAGGTACTCCGCCCCCCCTGGGGACCCCCCACAA 
GACTCTCGGAAGTGCCTCCGATACTGGTCCATGAGGCGGGGGGGACCCCTGGGGGGTGTT 

ProGluTyrAspLeuGluLeuIleThrSerCysSerSerAsnValSerValAlaHisAsp 
4 622 CCAGAATACGACTTGGAGCTCATAACATCATGCTCCTCCAACGTGTCAGTCGCCCACGAC 
GGTCTTATGCTGAACCTCGAGTATTGTAGTACGAGGAGGTTGCACAGTCAGCGGGTGCTG 

4 637 SACI, 

GlyAlaGlyLysArgValTyrTyrLeuThrArgAspProThrThrProLeuAlaArgAla 
4 682 GGCGCTGGAAAGAGGGTCTACTACCTCACCCGTGACCCTAC AACCCCCCTCGCGAGAGCT 
CCGCGACCTTTCTCCCAGATGATGGAGTGGGCACTGGGATGTTGGGGGGAGCGCTCTCGA 

4731 NRUI, 

AlaTrpGluThrAlaArgHisThrProValAsnSerTrpLeuGlyAsnllelleMetPhe 
4 7 42 GCGTGGGAGACAGCAAGACACACTCCAGTCAATTCCTGGCTAGGCAACATAATCATGTTT 
CGCACCCTCTGTCGTTCTGTGTGAGGTCAGTTAAGGACCGATCCGTTGTATTAGTACAAA 

. AlaProThrLeuTrpAlaArgMetlleLeuMetThrHisPhePheSerValLeuIleAla 
4802 GCCCCCACACTGTGGGCGAGGATGATACTGATGACCCATTTCTTTAGCGTCCTTATAGCC 
CGGGGGTGTGACACCCGCTCCTACTATGACTACTGGGTAAAGAAATCGCAGGAATATCGG 

4806 PFLMl, 4807 DRA3, 

ArgAspGlnLeuGluGlnAlaLeuAspCysGluIleTyrGlyAlaCysTyrSerlleGiu 
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4 862 AGGGACCAGCTTGAACAGGCCCTCGATTGCGAGATCTACGGGGCCTGCTACTCCATAGAA 
TCCCTGGTCGAACTTGTCCGGGAGCTAACGC7CTAGATGCCCCGGACGATGAGGTATCTT 

4893 BGL2, 

" ProLeuAspLeuProProIlelleGlnArgLeuHisGlyLeuSerAlaPheSerLeuHis 
4 922 CCACTGGATCTACCTCCAATCATTCAAAGACTCCATGGCCTCAGCGCATTTTCACTCCAC 
GGTGACCTAGATGGAGGTTAGTAAGTTTCTGAGGTACCGGAGTCGCGTAAAAGTGAGGTG 

4 954 NCOI, 

SerTyrSerProGlyGiuIleAsnArgValAlaAlaCysLeuArgLysLeuGiyValPro 
4 982 AGTTACTCTCCAGGTGAAATCAATAGGGTGGCCGCATGCCTCAGAAAACTTGGGGTACCG 
TCAATGAGAGGTCCACTTTAGTTATCCCACCGGCGTACGGAGTCTTTTGAACCCCATGGC 

5015 SPHI, 5035 KPNI, 

ProLeuArgAlaTrpArgHisArgAlaArgSerValArgAlaArgLeuLeuAlaArgGly 
5042 CCCTTGCGAGCTTGGAGACACCGGGCCCGGAGCGTCCGCGCTAGGCTTCTGGCCAGAGGA 
GGGAACGCTCGAACCTCTGTGGCCCGGGCCTCGCAGGCGCGATCCGAAGACCGGTCTCCT 

5064 APAI, 5091 BALI, 

GlyArgAlaAlalleCysGlyLysTyrLeuPheAsnTrpAlaValArgThrLysLeuLys 
5102 GGCAGGGCTGCCATATGTGGCAAGTACCTCTTCAACTGGGCAGTAAGAACAAAGCTCAAA 
CCGTCCCGACGGTATACACCGTTCATGGAGAAGTTGACCCGTCATTCTTGTTTCGAGTTT 

- 5113 NDEI, 

LeuThrProIleAlaAlaAlaGlyGlnLeuAspLeuSerGlyTrpPheThrAlaGlyTyr 
5162 CTCACTCCAATAGCGGCCGCTGGCCAGCTGGACTTGTCCGGCTGGTTCACGGCTGGCTAC 
GAGTGAGGTTATCGCCGGCGACCGGTCGACCTGAACAGGCCGACCAAGTGCCGACCGATG 

A A A A 

5174 NOTI, 5175 EAGl XMA3, 5182 BALI, 5186 PVU2, 

SerGlyGlyAspIleTyrHisSerValSerHisAlaArgProArgTrpIleTrpPheCys 
5222 AGCGGGGGAGACATTTATCACAGCGTGTCTCATGCCCGGCCCCGCTGGATCTGGTTTTGC 
TCGCCCCCTCTGTAAATAGTGTCGCACAGAGTACGGGCCGGGGCGACCTAGACCAAAACG 

A 

5240 DRA3, 

LeuLeuLeuLeuAlaAlaGlyValGlylleTyrLeuLeuProAsnArgMetSerThrAsn 
. ^ 5282 CTACTCCTGCTTGCTGCAGGGGTAGGCATCTACCTCCTCCCCAACCGAATGAGCACGAAT 
GATGAGGACGAACGACGTCCCCATCCGTAGATGGAGGAGGGGTTGGCTTACTCGTGCTTA 

A 

5295 PSTI, 

ProLysProGlnArgLysThrLysArgAsnThrAsnArgArgProGlnAspValLysPhe 
5342 CCTAAACCTCAAAGAAAGACCAAACGTAACACCAACCGGCGGCCGCAGGACGTCAAGTTC 
GGATTTGGAGTTTCTTTCTGGTTTGCATTGTGGTTGGCCGCCGGCGTCCTGCAGTTCAAG 

5380 NOTI, 5381 EAGl XMA3, 5390 AAT2, 5401 SMAI XMAI, 

ProGlyGlyGlyGlnlieValGlyGlyValTyrLeuLeuProArgArgGlyProArgLeu 
5402 CCGGGTGGCGGTCAGATCGTTGGTGGAGTTTACTTGTTGCCGCGCAGGGGCCCTAGATTG 
GGCCCACCGCCAGTCTAGCAACCACCTCAAATGAACAACGGCGCGTCCCCGGGATCTAAC 
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54 49 APAI, 

GlyValArgAlaThrArgLysThrSerGluArgSerGlnProArgGlyArgArgGlnPro 
5462 GGTGTGCGCGCGACGAGAAAGACTTCCGAGCGGTCGCAACCTCGAGGTAGACGTCAGCCT 
. . CCACACGCGCGCTGCTCTTTCTGAAGGCTCGCCAGCGTTGGAGCTCCATCTGCAGTCGGA 

5467 BSSH2, 5478 XMNI, 5502 XHOI, 5511 AAT2, 

IleProLysAlaArgArgProGluGlyArgThrTrpAlaGlnProGlyTyrProTrpPro 
5522 ATCCCCAAGGCTCGTCGGCCCGAGGGCAGGACCTGGGCTCAGCCCGGGTACCCTTGGCCC 
TAGGGGTTCCGAGCAGCCGGGCTCCCGTCCTGGACCCGAGTCGGGCCCATGGGAACCGGG 

A AAA 

5548 ALWNl, 5558 ESPl, 5564 SMAI XMAI, 5568 KPNI, 

LeiiTyrGlyAsnGluGlyCysGlyTrpAlaGlyTrpLeuLeuSerProArgGlySerArg 
5582 CTCTATGGCAATGAGGGCTGCGGGTGGGCGGGATGGCTCCTGTCTCCCCGTGGCTCTCGG 
GAGATACCGTTACTCCCGACGCCCACCCGCCCTACCGAGGACAGAGGGGCACCGAGAGCC 

ProSerTrpGlyProThrAspProArgArgArgSerArgAsnLeuGlyLysOC AM 
564 2 CCTAGCTGGGGCCCCACAGACCCCCGGCGTAGGTCGCGCAATTTGGGTAAGTAATAGTCG 
GGATCGACCCCGGGGTGTCTGGGGGCCGCATCCAGCGCGTTAAACCCATTCATTATCAGC 

5650 APAI, 5698 SALI, 



5702 AC 
TG 
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MetAlaAlaTyrAlaAlaGlnGlyTyrLysValLeuValLeuAsn 
2 AGCTTACAAAACAAAATGGCTGCATATGCAGCTCAGGGCTATAAGGTGCTAGTACTCAAC 
• ■ TCGAATGTTTTGTTTTACCGACGTATACGTCGAGTCCCGATATTCCACGATCATGAGTTG 

1 HIND3, 24 NDEI, 52 SCAI, 

ProSerValAlaAlaThrLeuGlyPheGlyAlaTyrMetSerLysAlaHisGlylieAsp 
62 CCCTCTGTTGCTGCAACACTGGGCTTTGGTGCTTACATGTCCAAGGCTCATGGGATCGAT 
GGGAGACAACGACGTTGTGACCCGAAACCACGAATGTACAGGTTCCGAGTACCCTAGCTA 

116 CLAI. 

ProAsnlleArgThrGlyValArgThrlleThrThrGlySerProIleThrTyrSerThr 
122 CCTAACATCAGGACCGGGGTGAGAACAATTACCACTGGCAGCCCCATCACGTACTCCACC 
GGATTGTAGTCCTGGCCCCACTCTTGTTAATGGTGACCGTCGGGGTAGTGCATGAGGTGG 

TyrGlyLysPheLeuAlaAspGlyGlyCysSerGlyGlyAlaTyrAspIlelielieCys 
182 T ACGGC AAGTTCCTTGCCGACGGCGGGTGCTCGGGGGGCGCTTATGACATAATAATTTGT 
ATGCCGTTCAAGGAACGGCTGCCGCCCACGAGCCCCCCGCGAATACTGTATTATTAAACA 

AspGluCysHisSerThrAspAlaThrSerlleLeuGlylleGlyThrValLeuAspGln 
24 2 GACGAGTGCCACTCCACGGATGCCACATCCATCTTGGGCATTGGCACTGTCCTTGACC AA 
CTGCTCACGGTGAGGTGCCTACGGTGTAGGTAGAACCCGTAACCGTGACAGGAACTGGTT 

AiaGluThrAlaGlyAlaArgLeuValValLeuAlaThrAlaThrProProGlySerVal 
302 GCAGAGACTGCGGGGGCGAGACTGGTTGTGCTCGCCACCGCCACCCCTCCGGGCTCCGTC 
CG7CTCTGACGCCCCCGCTCTGACCAACACGAGCGGTGGCGGTGGGGAGGCCCGAGGCAG 

303 ALWNl, 

ThrValProHisProAsnlleGluGluValAlaLeuSerThrThrGlyGluIleProPhe 
. 362 ACTGTGCCCCATCCCAACATCGAGGAGGTTGCTCTGTCCACCACCGGAGAGATCCCTTTT 
TGACACGGGGTAGGGTTGTAGCTCCTCCAACGAGACAGGTGGTGGCCTCTCTAGGGAAAA 

TyrGlyLysAlalleProLeuGluVallleLysGlyGlyArgHisLeuIlePheCysHis 
4 22 TACGGCAAGGCTATCCCCCTCGAAGTAATCAAGGGGGGGAGACATCTCATCTTCTGTCAT 
ATGCCGTTCCGATAGGGGGAGCTTCATTAGTTCCCCCCCTCTGTAGAGTAGAAGACAGTA 

SerLysLysLysCysAspGluLeuAlaAlaLysLeuValAlaLeuGlylleAsnAlaVal 
' 4 82 TCAAAGAAGAAGTGCGACGAACTCGCCGCAAAGCTGGTCGCATTGGGCATCAATGCCGTG 
AGTTTCTTCTTCACGCTGCTTGAGCGGCGTTTCGACCAGCGTAACCCGTAGTTACGGCAC 

AlaTyrTyrArgGlyLeuAspValSerVallleProThrSerGlyAspValValValVal 
54 2 GCCTACTACCGCGGTCTTGACGTGTCCGTCATCCCGACCAGCGGCGATGTTGTCGTCGTG 
CGGATGATGGCGCCAGAACTGCACAGGCAGTAGGGCTGGTCGCCGCTACAACAGCAGCAC 

550 SAC2, 560 DRDl, 

AlaThrAspAlaLeuMetThrGlyTyrThrGlyAspPheAspSerVallleAspCysAsn 
602 GCAACCGATGCCCTCATGACCGGCTATACCGGCGACTTCGACTCGGTGATAGACTGCAAT 
CGTTGGCTACGGGAGTACTGGCCGATATGGCCGCTGAAGCTGAGCCACTATCTGACGTTA 
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ThrCysValThrGlnThrValAspPh SerLeuAspProThrPheThrlieiSiuThrrie 
6 62 ACGTGTGTCACCCAGACAGTCGATTTCAGCCTTGACCCTACCTTCACCATTGAGACAATC 
TGCACACAGTGGGTCTGTCAGCTAAAGTCGGAACTGGGATGGAAGTGGTAACTCTGTTAG 

... ThrLeuProGinAspAlaValSerArgThrGlnArgArgGlyArgThrGIyArgGlyLys 
722 ACGCTCCCCCAAGATGCTGTCTCCCGCACTCAACGTCGGGGCAGGACTGGCAGGGGGAAG 
TGCGAGGGGGTTCTACGACAGAGGGCGTGAGTTGCAGCCCCGTCCTGACCGTCCCCCTTC 

ProGlylleTyrArgPheValAlaProGlyGluArgProSerGlyMetPheAspSerSer 
782 CCAGGCATCTACAGATTTGTGGCACCGGGGGAGCGCCCCTCCGGCATGTTCGACTCGTCC 
GGTCCGTAGATGTCTAAACACCGTGGCCCCCTCGCGGGGAGGCCGTACAAGCTGAGCAGG 

816 BGLI, 833 DRDl, 

ValLeuCysGluCysTyrAspAlaGlyCysAlaTrpTyrGluLeuThrProAlaGluThr 
84 2 GTCCTCTGTGAGTGCTATGACGCAGGCTGTGCTTGGTATGAGCTCACGCCCGCCGAGACT 
CAGGAGACACTCACGATACTGCGTCCGACACGAACCATACTCGAGTGCGGGCGGCTCTGA 

881 SACI, 

ThrValArgLeuArgAiaTyrMetAsnThrProGlyLeuProVaiCysGinAspHisLeu 
902 ACAGTTAGGCTACGAGCGTACATGAACACCCCGGGGCTTCCCGTGTGCCAGGACCATCTT 
TGTCAATCCGATGCTCGCATGTACTTGTGGGGCCCCGAAGGGCACACGGTCCTGGTAGAA 

931 SMAI XMAI, 

GluPheTrpGluGlyValPheThrGlyLeuThrHisIleAspAlaHisPheLeuSerGln 
962 GAATTTTGGGAGGGCGTCTTTACAGGCCTCACTCATATAGATGCCCACTTTCTATCCCAG 
CTTAAAACCCTCCCGCAGAAATGTCCGGAGTGAGTATATCTACGGGTGAAAGATAGGGTC 

985 STUI, 

ThrLysGlnSerGlyGiuAsnLeuProTyrLeuValAlaTyrGlnAlaThrValCysAla 
1022 ACAAAGCAGAGTGGGGAGAACCTTCCTTACCTGGTAGCGTACCAAGCCACCGTGTGCGCT 
TGTTTCGTCTCACCCCTCTTGGAAGGAATGGACCATCGCATGGTTCGGTGGCACACGCGA 

1069 DRA3, 

ArgAiaGlnAlaProProProSerTrpAspGlnMetTrpLysCysLeuIieArgLeuLys 
1082 AGGGCTCAAGCCCCTCCCCCATCGTGGGACCAGATGTGGAAGTGTTTGATTCGCCTCAAG 
TCCCGAGTTCGGGGAGGGGGTAGCACCCTGGTCTACACCTTCACAAACTAAGCGGAGTTC 

ProThrLeuHisGlyProThrProLeuLeuTyrArgLeuGlyAlaValGlnAsnGluIle 
114 2 CCCACCCTCCATGGGCCAACACCCCTGCTATACAGACTGGGCGCTGTTCAGAATGAAATC 
GGGTGGGAGGTACCCGGTTGTGGGGACGATATGTCTGACCCGCGACAAGTCTTACTTTAG 

1150 NCOI, 

ThrLeuThrHisProValThrLysTyrlleMetThrCysMetSerAlaAspLeuGluVal 
1202 ACCCTGACGCACCCAGTCACCAAATACATCATGACATGCATGTCGGCCGACCTGGAGGTC 
TGGGACTGCGTGGGTCAGTGGTTTATGTAGTACTGTACGTACAGCCGGCTGGACCTCCAG 

AAA A A 

1230 BSPHl, 1234 DRDl, 1237 AVA3/ 1245 EAGl XMA3, 1250 DRDl, 



ValThrSerThrTrpValLeuValGlyGlyValLeuAlaAlaLeuAlaAlaTyrCysLeu 
1262 GTCACGAGCACCTGGGTGCTCGTTGGCGGCGTCCTGGCTGCTTTGGCCGCGT ATTGCCTG 
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CAGTGCTCGTGGACCCACGAGCAACCGCCGCAGGACCGACGAAACCGGCGCT^TAACGGAC 

SerThrGlyCysValValllfeValGlyArgValValLeuSerGlyLysProAlallelie 
1322 TCAACAGGCTGCGTGGTCATAGTGGGCAGGGTCGTCTTGTCCGGGAAGCCGGCAATCATA 
AGTTGTCCGACGCACCAGTATCACCCGTCCCAGCAGAACAGGCCCTTCGGCCGTTAGT AT 

1369 NAEI, 

ProAspArgGluValLeuTyrArgGluPheAspGluMetGluGluCysSerGlnHisLeu 
1382 CCTGACAGGGAAGTCCTCTACCGAGAGTTCGATGAGATGGAAGAGTGCTCTCAGCACTTA 
GGACTGTCCCTTCAGGAGATGGCTCTCAAGCTACTCTACCTTCTCACGAGAGTCGTGAAT 

1385 DRDl, 

ProTyrlleGluGlnGlyMetMetLeuAlaGluGlnPheLysGlnLysAlaLeuGiyLeu 
14 42 CCGTACATCGAGCAAGGGATGATGCTCGCCGAGCAGTTCAAGCAGAAGGCCCTCGGCCTC 
GGCATGTAGCTCGTTCCCTACTACGAGCGGCTCGTCAAGTTCGTCTTCCGGGAGCCGGAG 

LeuGlnThrAlaSerArgGlnAlaGluVallleAlaProAlaValGlnThrAsnTrpGln 
1502 CTGCAGACCGCGTCCCGTCAGGCAGAGGTTATCGCCCCTGCTGTCCAGACCAACTGGCAA 
GACGTCTGGCGCAGGGCAGTCCGTCTCCAATAGCGGGGACGACAGGTCTGGTTGACCGTT 

1502 PSTI, 1507 TTH3I, 

LysLeuGluThrPheTrpAlaLysHisMetTrpAsnPhelleSerGlylleGlnTyrLeu 
1562 AAACTCGAGACCTTCTGGGCGAAGCATATGTGGAACTTCATCAGTGGGATACAATACTTG 
TTTGAGCTCTGGAAGACCCGCTTCGTATACACCTTGAAGTAGTCACCCTATGTTATGAAC 

1565 XHOI, 1586 NDEI, 

AlaGlyLeuSerThrLeuProGlyAsnProAialleAlaSerLeuMetAlaPheThrAla 
1622 GCGGGCTTGTCAACGCTGCCTGGTAACCCCGCCATTGCTTCATTGATGGCTTTTACAGCT 
CGCCCGAACAGTTGCGACGGACCATTGGGGCGGTAACGAAGTAACTACCGAAAATGTCGA 

164 3 BSTE2, 1677 ALWNl PVU2, 

AlaValThrSerProLeuThrThrSerGlnThrLeuLeuPheAsnlleLeuGlyGlyTrp 
1682 GCTGTCACCAGCCCACTAACCACTAGCCAAACCCTCCTCTTCAACATATTGGGGGGGTGG 
CGACAGTGGTCGGGTGATTGGTGATCGGTTTGGGAGGAGAAGTTGTATAACCCCCCCACC 

ValAlaAlaGlnLeuAlaAlaProGlyAlaAlaThrAlaPheValGlyAlaGlyLeuAla 
17 42 GTGGCTGCCCAGCTCGCCGCCCCCGGTGCCGCTACTGCCTTTGTGGGCGCTGGCTTAGCT 
CACCGACGGGTCGAGCGGCGGGGGCCACGGCGATGACGGAAACACCCGCGACCGAATCGA 

1794 ESPl, 

GlyAlaAlalleGlySerValGlyLeuGlyLysValLeuIleAspIleLeuAlaGlyTyr 
1302 GGCGCCGCCATCGGCAGTGTTGGACTGGGGAAGGTCCTCATAGACATCCTTGCAGGGT AT 
CCGCGGCGGTAGCCGTCACAACCTGACCCCTTCCAGGAGTATCTGTAGGAACGTCCCATA 

1802 KASl NARI, 

GlyAlaGlyValAlaGlyAlaLeuValAlaPheLysIleMetSerGlyGluValProSer 
1862 GGCGCGGGCGTGGCGGGAGCTCTTGTGGCATTCAAGATCATGAGCGGTGAGGTCCCCTCC 
CCGCGCCCGCACCGCCCTCGAGAACACCGTAAGTTCTAGTACTCGCCACTCCAGGGGAGG 

1878 SACI, 1899 BSPHl, 
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ThrGluAsoLeuValAsnLeuLeuProAlalleLeuSerProGlyAlaLeuValVaiGiy 
1922 ACGGAGGACCTGGTCAATCTACTGCCCGCCATCCTCTCGCCCGGAGCCCTCGTAGTCGGC 
TGCCTCCTGGACCAGTTAGATGACGGGCGGTAGGAGAGCGGGCCTCGGGAGCATCAGCCG 

A 

1928-TTH3I, 

ValValCysAlaAlalleLeuArgArgHisValGlyProGlyGluGlyAlaValGlnTrp 
1982 GTGGTCTGTGCAGCAATACTGCGCCGGCACGTTGGCCCGGGCGAGGGGGCAGTGCAGTGG 
CACCAGACACGTCGTTATGACGCGGCCGTGCAACCGGGCCCGCTCCCCCGTCACGTCACC 

A A 

2004 NAEI, 2017 SMAI XMAI, 

MetAsnArgLeuIleAlaPheAlaSerArgGlyAsnHisValSerProThrHisTyrVal 
2042 ATGAACCGGCTGATAGCCTTCGCCTCCCGGGGGAACCATGTTTCCCCCACGC ACTACG7G 
TACTTGGCCGACTATCGGAAGCGGAGGGCCCCCTTGGTACAAAGGGGGTGCGTGATGCAC 

- A 

2067 SMAI XMAI, 2093 DRA3, 

ProGluSerAspAlaAlaAlaArgValThrAlalleLeuSerSerLeuThrValThrGln 
2102 CCGGAGAGCGATGCAGCTGCCCGCGTCACTGCCATACTCAGCAGCCTCACTGTAACCCAG 
GGCCTCTCGCTACGTCGACGGGCGCAGTGACGGTATGAGTCGTCGGAGTGACATTGGGTC 

2115 PVU2, 2159 ALWNl, 

LeuLeuArgArgLeuHisGlnTrpIleSerSerGluCysThrThrProCysSerGlySer 
2162 CTCCTGAGGCGACTGCACCAGTGGATAAGCTCGGAGTGTACCACTCCATGCTCCGGTTCC 
GAGGACTCCGCTGACGTGGTCACCTATTCGAGCCTCACATGGTGAGGTACGAGGCCAAGG 

A ^ 

2164 MST2, 2220 ECONl, 

TrpLeuArgAspIleTrpAspTrpIleCysGluValLeuSerAspPheLysThrTrpLeu 
2222 TGGCTAAGGGACATCTGGGACTGGATATGCGAGGTGTTGAGCGACTTTAAGACCTGGCTA 
ACCGATTCCCTGTAGACCCTGACCTATACGCTCCACAACTCGCTGAAATTCTGGACCGAT 

LysAlaLysLeuMetProGlnLeuProGlylleProPheValSerCysGlnArgGiyTyr 
2282 AAAGCTAAGCTCATGCCACAGCTGCCTGGGATCCCCTTTGTGTCCTGCCAGCGCGGGTAT 
TTTCGATTCGAGTACGGTGTCGACGGACCCTAGGGGAAACACAGGACGGTCGCGCCCATA 

A A A 

2285 ESPl, 2300 PVU2, ?310 BAMHI, 

LysGlyValTrpArgGlyAspGlylleMetHisThrArgCysHisCysGlyAlaGluIle 
2342 AAGGGGGTCTGGCGAGGGGACGGCATCATGCACACTCGCTGCCACTGTGGAGCTGAGATC 
TTCCCCCAGACCGCTCCCCTGCCGTAGTACGTGTGAGCGACGGTGACACCTCGACTCTAG 

ThrGlyHisValLysAsnGlyThrMetArglleValGlyProArgThrCysArgAsnMet 
24 02 ACTGGACATGTCAAAAACGGGACGATGAGGATCGTCGGTCCTAGGACCTGCAGGAACATG 
TGACCTGTACAGTTTTTGCCCTGCTACTCCTAGCAGCCAGGATCCTGGACGTCCTTGTAC 

A A A A 

2425 BSABl, 2441 AVR2, 2448 SSE83871, 2449 PSTI, 

TrpSerGlyThrPheProIleAsnAlaTyrThrThrGlyProCysThrProLeuProAla 
24 62 TGGAGTGGGACCTTCCCCATTAATGCCTACACCACGGGCCCCTGTACCCCCCTTCCTGCG 
ACCTCACCCTGGAAGGGGTAATTACGGATGTGGTGCCCGGGGACATGGGGGGAAGGACGC 

A A 

2480 ASEl, 2497 APAI, 
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ProAsnTyrThrPheAlaLeuTrpArgValSerAlaGluGluTyrValGluIleArgGin 
2522 CCGAACTACACGTTCGCGCTATGGAGGGTGTCTGCAGAGGAATACGTGGAGATAAGGCAG 
GGCTTGATGTGCAA6CGCGATACCTCCCACAGACGTCTCCTTATGCACCTCTAT7CCGTC 

^2553 PSTI, 

ValGlyAspPheHisTyrValThrGlyMetThrThrAspAsnLeuLysCysProCysGln 
2582 GTGGGGGACTTCCACTACGTGACGGGTATGACTACTGACAATCTTAAATGCCCGTGCCAG 
CACCCCCTGAAGGTGATGCACTGCCCATACTGATGACTGTTAGAATTTACGGGCACGGTC 

2594 DRA3, 

ValProSerProGluPhePheThrGluLeuAspGlyValArgLeuHisArgPheAlaPro 
264 2 GTCCCATCGCCCGAATTTTTCACAGAATTGGACGGGGTGCGCCTAC ATAGGTTTGCGCCC 
CAGGGTAGCGGGCTTAAAAAGTGTCTTAACCTGCCCCACGC6GATGTATCCAAACGCGGG 

ProCysLysProLeuLeuArgGluGluValSerPheArgValGlyLeuHisGiuTyrPro 
2702 CCCTGCAAGCCCTTGCTGCGGGAGGAGGTATCATTCAGAGTAGGACTCCACGAATACCCG 
GGGACGTTCGGGAACGACGCCCTCCTCCATAGTAAGTCTCATCCTGAGGTGCTTATGGGC 

2757 HGIE2, 

ValGiySerGlnLeuProCysGluProGluProAspValAlaValLeuThrSerMetLeu 
27 62 GTAGGGTCGCAATTACCTTGCGAGCCCGAACCGGACGTGGCCGTGTTGACGTCCATGCTC 
• CATCCCAGCGTTAATGGAACGCTCGGGCTTGGCCTGCACCGGCACAACTGCAGGTACGAG 

2809 AAT2, 

ThrAspProSerHisIleThrAlaGluAlaAlaGlyArgArgLeuAlaArgGlySerPro 
2822 ACTGATCCCTCCCATATAACAGCAGAGGCGGCCGGGCGAAGGTTGGCGAGGGGATCACCC 
TGACTAGGGAGGGTATATTGTCGTCTCCGCCGGCCCGCTTCCAACCGCTCCCCTAGTGGG 

2850 EAGl XMA3, 

ProSerValAlaSerSerSerAlaSerGlnLeuSerAlaProSerLeuLysAiaThrCys 
2882 CCCTCTGTGGCCAGCTCCTCGGCTAGCCAGCTATCCGCTCCATCTCTCAAGGCAACTTGC 
GGGAGACACCGGTCGAGGAGCCGATCGGTCGATAGGCGAGGTAGAGAGTTCCGTTGAACG 

2889 BALI, 2903 NHEI, 

ThrAlaAsnHisAspSerProAspAlaGiuLeuIleGiuAlaAsnLeuLeuTrpArgGln 
294 2 ACCGCTAACCATGACTCCCCTGATGCTGAGCTCATAGAGGCCAACCTCCTATGGAGGCAG 
TGGCGATTGGTACTGAGGGGACTACGACTCGAGTATCTCCGGTTGGAGGATACCTCCGTC 

2966 ESPl, 2969 SACI, 

GluMetGlyGlyAsnlleThrArgValGluSerGluAsnLysValVallleLeuAspSer 
3002 GAGATGGGCGGCAACATCACCAGGGTTGAGTCAGAAAACAAAGTGGTGATTCTGGACTCC 
CTCTACCCGCCGTTGTAGTGGTCCCAACTCAGTCTTTTGTTTCACCACTAAGACCTGAGG 

PheAspProLeuValAlaGluGluAspGluArgGluIieSerValProAlaGluIleLeu 
3062 TTCGATCCGCTTGTGGCGGAGGAGGACGAGCGGGAGATCTCCGTACCCGCAGAAATCCTG 
AAGCTAGGCGAACACCGCCTCCTCCTGCTCGCCCTCTAGAGGCATGGGCGTCTTTAGGAC 

3096 BGL2, 

ArgLysSerArgArgPheAlaGlnAlaLeuProValTrpAlaArgProAspTyrAsnPro 
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3122 CGGAAGTCTCGGAGATTCGCCCAGGCCCTGCCCGTTTGGGCGCGGCCGGACfATAACCCC 
GCCTTCAGAGCCTCTAAGCGGGTCCGGGACGGGCAAACCCGCGCCGGCCTGATATTGGGG 

3143 ALWNl, 3164 EAGl XMA3, 

ProI/euValGluThrTrpLysLysProAspTyrGluProProValValHisGlyCysPro 
3182 CCGCT AGTGGAGACGTGGAAAAAGCCCGACTACGAACCACCTGTGGTCCATGGCTGCCCG 
GGCGATCACCTCTGCACCTTTTTCGGGCTGATGCTTGGTGGACACCAGGTACCGACGGGC 

A A 

3217 HGIE2, 3229 NCOI, 

LeuProProProLysSerProProValProProProArgLysLysArgThrValValLeu 
3242 CTTCCACCTCCAAAGTCCCCTCCTGTGCCTCCGCCTCGGAAGAAGCGGACGGTGGTCCTC 
GAAGGTGGAGGTTTCAGGGGAGGACACGGAGGCGGAGCCTTCTTCGCCTGCCACCAGGAG 

ThrGluSerThrLeuSerThrAlaLeuAlaGluLeuAlaThrArgSerPheGlySerSer 
3302 ACTGAATCAACCCTATGTACTGCCTTGGCCGAGCTCGCCACCAGAAGCTTTGGCAGCTCC 
TGACTTAGTTGGGATAGATGACGGAACCGGCTCGAGCGGTGGTCTTCGAAACCGTCGAGG 

A A 

3332 SACI, 3346 HIND3, 

SerThrSerGlylleThrGlyAspAsnThrThrThrSerSerGluProAlaProSerGly 
3362 TCAACTTCCGGCATTACGGGCGACAATACGACAACATCCTCTGAGCCCGCCCCTTCTGGC 
AGTTGAAGGCCGTAATGCCCGCTGTTATGCTGTTGTAGGAGACTCGGGCGGGGAAGACCG 

CysProProAspSerAspAlaGluSerTyrSerSerMetProProLeuGluGlyGluPro 
34 22 TGCCCCCCCGACTCCG ACGCTGAGTCCTATTCCTCCATGCCCCCCCTGGAGGGGGAGCCT 
ACGGGGGGGCTGAGGCTGCGACTCAGGATAAGGAGGTACGGGGGGGACCTCCCCCTCGGA 

3437 EAM11051, 

GlyAspProAspLeuSerAspGlySerTrpSerThrValSerSerGluAlaAsnAlaGlu 
3482 GGGGATCCGGATCTTAGCGACGGGTCATGGTCAACGGTCAGTAGTGAGGCCAACGCGGAG 
CCCCTAGGCCTAGAATCGCTGCCCAGTACCAGTTGCCAGTCATCACTCCGGTTGCGCCTC 

A A A 

3484 BAMHI, 3485 BSABl, 3487 BSPEl, 

AspValValCysCysSerMetSerTyrSerTrpThrGlyAlaLeuValThrProCysAla 
354 2 GATGTCGTGTGCTGCTCAATGTCTTACTCTTGGACAGGCGCACTCGTCACCCCGTGCGCC 
CTACAGCACACGACGAGTTACAGAATGAGAACCTGTCCGCGTGAGCAGTGGGGCACGCGG 

A A 

3589 DRA3, 3600 SAC2, 

AlaGluGluGlnLysLeuProIleAsnAlaLeuSerAsnSerLeuLeuArgHisHisAsn 
3602 GCGGAAGAACAGAAACTGCCCATCAATGCACTAAGCAACTCGTTGCTACGTCACCACAAT 
CGCCTTCTTGTCTTTGACGGGTAGTTACGTGATTCGTTGAGCAACGATGCAGTGGTGTTA 

A A 

3611 ALWNl, 3655 PFLMl, 

LeuValTyrSerThrThrSerArgSerAlaCysGlnArgGlnLysLysValThrPheAsp 
3662 TTGGTGTATTCCACCACCTCACGCAGTGCTTGCCAAAGGCAGAAGAAAGTCACATTTGAC 
AACCACATAAGGTGGTGGAGTGCGTCACGAACGGTTTCCGTCTTCTTTCAGTGTAAACTG 

A ' 

3681 DRA3, 

ArgLeuGinValLeuAspSerHisTyrGlnAspValLeuLysGluValLysAiaAlaAla 
3722 AGACTGCAAGTTCTGGACAGCCATTACCAGGACGTACTCAAGGAGGTTAAAGCAGCGGCG 
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TCTGACGTTCAAGACCTGTCGGTAATGGTCCTGCATGAGTTCCTCCAATTTCGTCGCCGC 

SerLysValLysAlaAsnLeuLeuSerValGluGluAlaCysSerLeuThrProProHis 
3782 TCAAAAGTGAAGGCTAACTTGCTATCCGTAGAGGAAGCTTGCAGCCTGACGCCCCCACAC 
... AGTTTTCACTTCCGATTGAACGATAGGCATCTCCTTCGAACGTCGGACTGCGGGGGTGTG 

3816 HIND3, 

SerAlaLysSerLysPheGlyTyrGlyAlaLysAspValArgCysHisAlaArgLysAla 
38 4 2 TCAGCCAAATCCAAGTTTGGTTATGGGGCAAAAGACGTCCGTTGCCATGCCAGAAAGGCC 
AGTCGGTTTAGGTTCAAACCAATACCCCGTTTTCTGCAGGCAACGGTACGGTCTTTCCGG 

A ^ 

3875 AAT2r 3890 BGLI, 

ValThrHisIleAsnSerVaiTrpLysAspLeuLeuGluAspAsnVaiThrProIleAsp 
3902 GTAACCCACATCAACTCCGTGTGGAAAGACCTTCTGGAAGACAATGTAACACCAATAGAC 
CATTGGGTGTAGTTGAGGCACACCTTTCTGGAAGACCTTCTGTTACATTGTGGTTATCTG 

ThrThrlleMetAlaLysAsnGluValPheCysValGinProGluLysGlyGlyArgLys 
3962 ACTACCATCATGGCTAAGAACGAGGTTTTCTGCGTTCAGCCTGAGAAGGGGGGTCGTAAG 
TGATGGTAGTACCGATTCTTGCTCCAAAAGACGCAAGTCGGACTCTTCCCCCCAGCATTC 

ProAlaArgLeuIleValPheProAspLeuGlyValArgValCysGluLysMetAlaLeu 
4 022 CCAGCTCGTCTCATCGTGTTCCCCGATCTGGGCGTGCGCGTGTGCGAAAAGATGGCTTTG 
GGTCGAGCAGAGTAGCACAAGGGGCTAGACCCGCACGCGCACACGCTTTTCTACCGAAAC 

TyrAspValValThrLysLeuProLeuAlaValMetGlySerSerTyrGlyPheGlnTyr 
4 082 TACGACGTGGTT ACAAAGCTCCCCTTGGCCGTGATGGGAAGCTCCT ACGG ATTCC AATAC 
ATGCTGCACCAATGTTTCGAGGGGAACCGGCACTACCCTTCGAGGATGCCTAAGGTTATG 

SerProGlyGlnArgValGluPheLeuValGlnAlaTrpLysSerLysLysThrProMet 
4142 TCACCAGGACAGCGGGTTGAATTCCTCGTGCAAGCGTGGAAGTCCAAGAAAACCCCAATG 
AGTGGTCCTGTCGCCCAACTTAAGGAGCACGTTCGCACCTTCAGGTTCTTTTGGGGTTAC 

4160 ECORI, 

GlyPheSerTyrAspThrArgCysPheAspSerThrValThrGluSerAspIleArgThr 
4 202 GGGTTCTCGTATGATACCCGCTGCTTTGACTCCACAGTCACTGAGAGCGACATCCGTACG 
CCCAAGAGCATAC7ATGGGCGACGAAACTGAGGTGTCAGTGACTCTCGCTGTAGGCATGC 

- . A A 

4229 DRDl, 4236 ALWNl, 

GluGluAlalleTyrGinCysCysAspLeuAspProGlnAlaArgValAlalleLysSer 
4 262 GAGGAGGCAATCTACCAATGTTGTGACCTCGACCCCCAAGCCCGCGTGGCCATCAAGTCC 
CTCCTCCGTTAGATGGTTACAACACTGGAGCTGGGGGTTCGGGCGCACCGGTAGTTCAGG 

4 301 BGLI, 4308 BALI, 

LeuThrGluArgLeuTyrValGlyGlyProLeuThrAsnSerArgGlyGluAsnCysGly 
4 322 CTCACCGAGAGGCTTTATGTTGGGGGCCCTCTTACCAATTCAAGGGGGGAGAACTGCGGC 
GAGTGGCTCTCCGAAATACAACCCCCGGGAGAATGGTTAAGTTCCCCCCTCTTGACGCCG 

A 

4345 APAI, 

TyrArgArgCysArgAlaS rGlyValLeuThrXhrSerCysGlyAsnThrLeuThrCys 
4 382 TATCGCAGGTGCCGCGCGAGCGGCGTACTGACAACTAGCTGTGGTAACACCCTCACTTGC 
ATAGCGTCCACGGCGCGCTCGCCGCATGACTGTTGATCGACACCATTGTGGGAGTGAACG 
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TyrlleLysAlaArgAlaAlaCysArgAlaAlaGlyLeuGlnAspCysThrMetLeuVai 
4442 TACATCAAGGCCCGGGCAGCCTGTCGAGCCGCAGGGCTCCAGGACTGCACCATGCTCGTG 
ATGTAGTTCCGGGCCCGTCGGACAGCTCGGCGTCCCGAGGTCCTGACGTGGTACGAGCAC 

4452 SMAI XMAI, 

CysGlyAspAspLeuValVallleCysGluSerAlaGlyValGlnGluAspAlaAlaSer 
4502 TGTGGCGACGACTTAGTCGTTATCTGTGAAAGCGCGGGGGTCCAGGAGGACGCGGCGAGC 
ACACCGCTGCTGAATCAGCAATAGACACTTTCGCGCCCCCAGGTCCTCCTGCGCCGCTCG 

4508 DRDl, 4511 TTH3I, 

LeuArgAlaPheThrGluAlaMetThrArgTyrSerAlaProProGlyAspProProGln 
4 562 CTGAGAGCCTTCACGGAGGCTATGACCAGGTACTCCGCCCCCCCTGGGGACCCCCCACAA 
GACTCTCGGAAGTGCCTCCGATACTGGTCCATGAGGCGGGGGGGACCCCTGGGGGGTGTT 

ProGluTyfAspLeuGluLeuIleThrSerCysSerSerAsnValSerValAlaHisAsp 
4 622 CCAGAATACGACTTGGAGCTCATAACATCATGCTCCTCCAACGTGTCAGTCGCCCACGAC 
GGTCTTATGCTGAACCTCGAGTATTGTAGTACGAGGAGGTTGCACAGTCAGCGGGTGCTG 

4 637 SACI, 

GlyAlaGlyLysArgValTyrTyrLeuThrArgAspProThrThrProLeuAlaArgAla 
4 682 GGCGCTGGAAAGAGGGTCTACTACCTCACCCGTGACCCTACAACCCCCCTCGCGAGAGCT 
CCGCGACCTTTCTCCCAGATGATGGAGTGGGCACTGGGATGTTGGGGGGAGCGCTCTCGA 

A 

4731 NRUI, 

AlaTrpGluThrAlaArgHisThrProValAsnSerTrpLeuGlyAsnllelleMetPhe 
4 74 2 GCGTGGGAGACAGCAAGACACACTCCAGTCAATTCCTGGCTAGGCAACATAATCATGTTT 
CGCACCCTCTGTCGTTCTGTGTGAGGTCAGTTAAGGACCGATCCGTTGTATTAGTACAAA 

AlaProThrLeuTrpAlaArgMetlleLeuMetThrHisPhePheSerValLeuIleAla 
4 802 GCCCCCACACTGTGGGCGAGGATGATACTGATGACCCATTTCTTTAGCGTCCTTATAGCC 
CGGGGGTGTGACACCCGCTCCTACTATGACTACTGGGTAAAGAAATCGCAGGAATATCGG 

A A 

4806 PFLMl, 4807 DRA3, 

ArgAspGlnLeuGluGlnAlaLeuAspCysGluIleTyrGlyAlaCysTyrSerlleGlu 
4 8 62 AGGGACCAGCTTGAACAGGCCCTCGATTGCGAGATCTACGGGGCCTGCTACTCCATAGAA 
TCCCTGGTCGAACTTGTCCGGGAGCTAACGCTCTAGATGCCCCGGACGATGAGGTATCTT 

A 

4893 BGL2, 

ProLeuAspLeuProProIlelleGlnArgLeuHisGlyLeuSerAlaPheSerLeuHis 
4 922 CCACTGGATCTACCTCCAATCATTCAAAGACTCCATGGCCTCAGCGCATTTTCACTCCAC 
GGTGACCTAGATGGAGGTTAGTAAGTTTCTGAGGTACCGGAGTCGCGTAAAAGTGAGGTG 

A 

4 954 NCOI, 

SerTyrSerProGlyGluIleAsnArgValAlaAlaCysLeuArgLysLeuGlyValPro 
4 982 AGTTACTCTCCAGGTGAAATCAATAGGGTGGCCGCATGCCTCAGAAAACTTGGGGTACCG 
TCAATGAGAGGTCCACTTTAGTTATCCCACCGGCGTACGGAGTCTTTTGAACCCCATGGC 

5015 SPHI, 5035 KPNI, 
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ProLeuArgAlaTrpArgHisArgAlaArgSerValArgAlaArgLeuLeuAlaArgGiy 
504 2 CCCTTGCGAGCTTGGAGACACCGGGCCCGGAGCGTCCGCGCTAGGCTTCTGGCCAGAGGA 
GGGAACGCTCGAACCTCTGTGGCCCGGGCCTCGCAGGCGCGATCCGAAGACCGGTCTCCT 

... 5064 APAI, 5091 BALI, 

GlyArgAlaAlalleCysGlyLysTyrLeuPheAsnTrpAlaValArgThrLysLeuLys 
5102 GGCAGGGCTGCCATATGTGGCAAGTACCTCTTCAACTGGGCAGTAAGAACAAAGCTCAAA 
CCGTCCCGACGGTATACACCGTTCATGGAGAAGTTGACCCGTCATTCTTGTTTCGAGTTT 

A 

5113 NDEI, 

LeuThrProIleAlaAlaAlaGlyGlnLeuAspLeuSerGlyTrpPheThrAlaGiyTyr 
5162 CTCACTCCAATAGCGGCCGCTGGCCAGCTGGACTTGTCCGGCTGGTTCACGGCTGGCTAC 
GAGTGAGGTTATCGCCGGCGACCGGTCGACCTGAACAGGCCGACCAAGTGCCGACCGATG 

A A A A 

5174 NOTI, 5175 EAGl XMA3, 5182 BALI, 5186 PVU2, 

SerGlyGlyAspIleTyrHisSerValSerHisAlaArgProArgTrpIleTrpPheCys 
5222 AGCGGGGGAGACATTTATCACAGCGTGTCTCATGCCCGGCCCCGCTGGATCTGGTTTTGC 
TCGCCCCCTCTGTAAATAGTGTCGCACAGAGTACGGGCCGGGGCGACCTAGACCAAAACG 

A 

5240 DRA3, 

LeuLeuLeuLeuAlaAlaGlyValGlylleTyrLeuLeuProAsnArgMetSerThrAsn 
5282 CT ACTCCTGCTTGCTGCAGGGGTAGGCATCTACCTCCTCCCCAACCGAATGAGCACGAAT 
GATGAGGACGAACGACGTCCCCATCCGTAGATGGAGGAGGGGTTGGCTTACTCGTGCTTA 

5295 PSTI, 

ProLysProGlnArgLysThrLysArgAsnThrAsnArgArgProGlnAspValLysPhe 
534 2 CCTAAACCTCAAAGAAAGACCAAACGTAAC ACCAACCGGCGGCCGCAGGACGTCAAGTTC 
GGATTTGGAGTTTCTTTCTGGTTTGCATTGTGGTTGGCCGCCGGCGTCCTGCAGTTCAAG 

A A A /S 

5380 NOTI, 5381 EAGl XMA3, 5390 AAT2, 5401 SMAI X^4AI, 

ProGlyGlyGlyGlnlleValGlyGlyValTyrLeuLeuProArgArgGlyProArgLeu 
54 02 CCGGGTGGCGGTCAGATCGTTGGTGGAGTTTACTTGTTGCCGCGCAGGGGCCCT AGATTG 
GGCCCACCGCCAGTCTAGCAACCACCTCAAATGAACAACGGCGCGTCCCCGGGATCTAAC 

5449 APAI, 

GlyValArgAlaThrArgLysThrSerGluArgSerGlnProArgGlyArgArgGlnPro 
54 62 GGTGTGCGCGCGACGAGAAAGACTTCCGAGCGGTCGCAACCTCGAGGTAGACGTCAGCCT 
CCACACGCGCGCTGCTCTTTCTGAAGGCTCGCCAGCGTTGGAGCTCCATCTGCAGTCGGA 

A A A 

5467 BSSH2, 5478 XMNI, 5502 XHOI, 5511 AAT2, 

IleProLysAlaArgArgProGluGlyArgThrTrpAlaGlnProGlyTyrProTrpPro 
5522 ATCCCCAAGGCTCGTCGGCCCGAGGGCAGGACCTGGGCTCAGCCCGGGTACCCTTGGCCC 
TAGGGGTTCCGAGCAGCCGGGCTCCCGTCCTGGACCCGAGTCGGGCCCATGGGAACCGGG 

A AAA 

5548 ALWNl, 5558 ESPl, 5564 SMAI XMAI, 5568 KPNI, 

LeuTyrGlyAsnGluGlyCysGlyTrpAlaGlyTrpLeuLeuSerProArgGlySerArg 
5582 CTCT ATGGCAATGAGGGCTGCGGGTGGGCGGGATGGCTCCTGTCTCCCCGTGGCTCTCGG 
GAGATACCGTTACTCCCGACGCCCACCCGCCCTACCGAGGACAGAGGGGCACCGAGAGCC 
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ProSerTroGlyProThrAspFroArgArgArgSerArgAsnLeuGlyLysVallleAsp 
5642 CCTAGCTGGGGCCCCACAGACCCCCGGCGTAGGTCGCGCAATTTGGGTAAGGTCATCG AT 
GGATCGACCCCGGGGTGTCTGGGGGCCGCATCCAGCGCGTTAAACCCATTCCAGTAGCTA 

56Sa APAI, 5696 CLAI, 

ThrLeuThrCysGlyPheAlaAspLeuMetGlyTyrlleProLeuValGlyAlaProLeu 
'5702 ACCCTTACGTGCGGCTTCGCCGACCTCATGGGGTACATACCGCTCGTCGGCGCCCCTCTT 
TGGGAATGCACGCCGAAGCGGCTGGAGTACCCCATGTATGGCGAGCAGCCGCGGGGAGAA 

5724 HGIE2, 5750 KASl NARI, 5756 ECONl, 

GlyGlyAlaAlaArgAlaLeuAlaHisGlyValArgValLeuGluAspGlyValAsnTyr 
57 62 GGAGGCGCTGCCAGGGCCCTGGCGCATGGCGTCCGGGTTCTGGAAGACGGCGTGAACTAT 
CCTCCGCGACGGTCCCGGGACCGCGTACCGCAGGCCCAAGACCTTCTGCCGCACTTGATA 

5772 BSTXI, 5775 APAI, 

AlaThrGlyAsnLeuProGlyCysSerOC AM 
5822 GCAACAGGGAACCTTCCTGGTTGCTCTTAATAGTCGAC 
CGTTGTCCCTTGGAAGGACCAACGAGAATTATCAGCTG 

5854 SALI, 
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MetAiaAlaTyrAlaAlaGlnGiyTyrLysValLeuValLeuAsn 
2 AGCTTACAAAACAAAATGGCTGCATATGCAGCTCAGGGCTATAAGGTGCTAGTACTCAAC 
TCGAATGTTTTGTTTTACCGACGTATACGTCGAGTCCCGATATTCCACGATCATGAGTTG 

1 HIND3, 24 NDEI, 52 SCAl, 

ProSerValAlaAlaThrLeuGlyPheGlyAlaTyrMetSerLysAlaHisGlylleAsD 
62 CCCTCTGTTGCTGCAACACTGGGCTTTGGTGCTTACATGTCCAAGGCTCATGGGATCGAT 
GGGAGACAACGACGTTGTGACCCGAAACCACGAATGTACAGGTTCCGAGTACCCTAGCTA 

116 CLAI, 

ProAsnlleArgThrGlyValArgThrlleThrThrGlySerProIleThrTyrSerThr 
122 CCTAACATCAGGACCGGGGTGAGAACAATTACCACTGGCAGCCCCATCACG7ACTCCACC 
GGATTGTAGTCCTGGCCCCACTCTTGTTAATGGTGACCGTCGGGGTAGTGCATGAGGTGG 

TyrGlyLysPheLeuAlaAspGlyGlyCysSerGlyGlyAiaTyrAspIlellelleCys 
182 TACGGCAAGTTCCTTGCCGACGGCGGGTGCTCGGGGGGCGCTT ATGACAT AAT AATTTGT 
ATGCCGTTCAAGGAACGGCTGCCGCCCACGAGCCCCCCGCGAATACTGTATTATTAAACA 

AspGluCysHisSerThrAspAlaThrSerlleLeuGlylieGlyThrValLeuAspGin 
^ 242 GACGAGTGCCACTCCACGGATGCCACATCCATCTTGGGCATTGGCACTGTCCTTGACCAA 
CTGCTCACGGTGAGGTGCCTACGGT.GfAGGTAGAACCCGTAACCGTGACAGGAACTGGTT 

AiaGluThrAlaGlyAlaArgLeuValValLeuAlaThrAiaThrProProGlySerVal 
302 GCAGAGACTGCGGGGGCGAGACTGGTTGTGCTCGCCACCGCCACCCCTCCGGGCTCCGTC 
CGTCTCTGACGCCCCCGCTCTGACCAACACGAGCGGTGGCGGTGGGGAGGCCCGAGGCAG 

303 ALWNl, 

ThrValProHisProAsnlleGluGluValAlaLeuSerThrThrGiyGluIleProPhe 
362 ACTGTGCCCCATCCCAACATCGAGGAGGTTGCTCTGTCCACCACCGGAGAGATCCCT7TT 
TGACACGGGGTAGGGTTGTAGCTCCTCCAACGAGACAGGTGGTGGCCTCTCTAGGGAAAA 

TyrGlyLysAlalleProLeuGluVallleLysGlyGlyArgHisLeuIlePheCysHis 
422 TACGGCAAGGCTATCCCCCTCGAAGTAATCAAGGGGGGGAGACATCTCATCTTCTGTCAT 
ATGCCGTTCCGATAGGGGGAGCTTCATTAGTTCCCCCCCTCTGTAGAGTAGAAGACAGTA 
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SerLysLysLysCysAspGluLSuAlaAlaLysLeuValAlaLeuGlylleAsnAlaVal 
48? TCAAAGAAGAAGTGCGACGAACTCGCCGCAAAGCTGGTCGCATTGGGCATCAATGCCGTG 
.AGTTTCTTCTTCACGCTGCTTGAGCGGCGTTTCGACCAGCGTAACCCGTAGTTACGGCAC 

AlaTyrTyrArgGlyLeuAspVaiSerVallleProThrSerGlyAspValValValVal 
542 GCCT ACTACCGCGGTCTTGACGTGTCCGTCATCCCGACCAGCGGCGATGTTGTCGTCGTG 
CGGATGATGGCGCCAGAACTGCACAGGCAGTAGGGCTGGTCGCCGCTACAACAGCAGCAC 

A. y\ 

550 SAC2, 560 DRDl, 

AlaThrAspAlaLeuMetThrGlyTyrThrGlyAspPheAspSerVallleAspCysAsn 

602 gcaaccgatgccctcatgaccggctataccggcgacttcgactcggtgatagactgcaat 
cgttggctacgggagtactggccgatatggccgctgaagctgagcgactatctgacgtta 

615 BSPHl, 

ThrCysValThrGlnThrValAspPheSerLeuAspProThrPheThrlleGluThrlle 
662 ACGTGTGTCACCCAGACAGTCGATTTCAGCCTTGACCCTACCTTCACCATTGAGACAATC 
TGCACACAGTGGGTCTGTCAGCTAAAGTCGGAACTGGGATGGAAGTGGTAACTCTGTTAG 

ThrLeuProGlnAspAIaValSerArgThrGlnArgArgGlyArgThrGlyArgGlyLys 
722 ACGCTCCCCCAAGATGCTGTCTCCCGCACTCAACGTCGGGGCAGGACTGGCAGGGGGAAG 
TGCGAGGGGGTTCTACGACAGAGGGCGTGAGTTGCAGCCCCGTCCTGACCGTCCCCCTTC 

ProGlylleTyrArgPheValAlaProGlyGluArgProSerGlyMetPheAspSerSer 
782 CCAGGCATCTACAGATTTGTGGCACCGGGGGAGCGCCCCTCCGGCATGTTCGACTCGTCC 
GGTCCGTAGATGTCTAAACACCGTGGCCCCCTCGCGGGGAGGCCGTACAAGCTGAGCAGG 

816 BGLI, 833 DRDl, 

ValLeuCysGiuCysTyrAspAiaGlyCysAiaTrpTyrGluLeuThrProAlaGiuThr 
84 2 GTCCTCTGTG AGTGCTATGACGCAGGCTGTGCTTGGT ATGAGCTCACGCCCGCCGAG ACT 
CAGGAGACACTCACGATACTGCGTCCGACACGAACCATACTCGA6TGCGGGCGGCTCTGA 

881 SACI, 

ThrValArgLeuArgAlaTyrMetAsnThrProGlyLeuProValCysGlnAspHisLeu 
902 ACAGTTAGGCTACGAGCGTACATGAACACCCCGGGGCTTCCCGTGTGCCAGGACCATCTT 
TGTCAATCCGATGCTCGCATGTACTTGTGGGGCCCCGAJQlGGGCACACGGTCCTGGTAGAA 

931 SMAI XMAI, 

GluPheTrpGluGlyValPheThrGlyLeuThrHisIleAspAlaHisPheLeuSerGln 
962 GAATTTTGGGAGGGCGTCTTTACAGGCCTCACTCATATAGATGCCCACTTTCTATCCCAG 
CTTAAAACCCTCCCGCAGAAATGTCCGGAGTGAGTATATCTACGGGTGAAAGATAGGGTC 

985 srvi, 

ThrLysGlnSerGlyGluAsnLeuProTyrLeuValAlaTyrGinAlaThrValCysAla 
1022 ACAAAGCAGAGTGGGGAGAACCTTCCTTACCTGGTAGCGTACCAAGCCACCGTGTGCGCT 
TGTTTCGTCTCACCCCTCTTGGAAGGAATGGACCATCGCATGGTTCGGTGGCACACGCGA 

1069 DRA3, 

ArgAlaGlnAiaProProProSerTrpAspGlnMetTrpLysCysLeuIleArgLeuLys 
1082 AGGGCTCAAGCCCCTCCCCCATCGTGGGACCAGATGTGGAAGTGTTTGATTCGCCTCAAG 
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TCCCGAGTTCGGGGAGGGGGTAGCACCCTGGTCTACACCTTCACAAACTAA'GCGGAGTTC 

ProThrLeuHisGlyProThrProLeuLeuTyrArgLeuGlyAlaValGlnAsnGluIle 
1142 CCCACCCTCCATGGGCCAACACCCCTGCTATACAGACTGGGCGCTGTTCAGAA7GAAATC 
GGGTGGGAGGTACCCGGTTGTGGGGACGATATGTCTGACCCGCGACAAGTCTTACTTTAG 

A 

llhb NCOI, 

ThrLeuThrHisProValThrLysTyrlleMetThrCysMetSerAlaAspLeuGluVal 
1202 ACCCTGACGC ACCCAGTCACCAAATACATCATGACATGCATGTCGGCCGACCTGGAGGTC 
TGGGACTGCGTGGGTCAGTGGTTTATGTAGTACTGTACGTACAGCCGGCTGGACCTCCAG 

1230 BSPHl, 1234 DRDl, 123*? AVA3, 1245 EAGl XMA3, 1250 DRDl, 



ValThrSerThrTrpValLeuValGlyGlyValLeuAlaAlaLeuAlaAlaTyrCysLeu 
1262 GTCACGAGCACCTGGGTGCTCGTTGGCGGCGTCCTGGCTGCTTTGGCCGCGT ATTGCCTG 
CAGTGCTCGTGGACCCACGAGCAACCGCCGCAGGACCGACGAAACCGGCGCATAACGGAC 

SerThrGlyCysValVallleValGlyArgValValLeuSerGlyLysProAlallelle 
1322 TCAACAGGCTGCGTGGTCATAGTGGGCAGGGTCGTCTTGTCCGGGAAGCCGGCAATCATA 
AGTTGTCCGACGCACCAGTATCACCCGTCCCAGCAGAACAGGCCCTTCGGCCGTTAGTAT 

1369 NAEI, 

ProAspArgGluValLeuTyrArgGluPheAspGluMetGluGluCysSerGlnHisLeu 
1382 CCTGACAGGGAAGTCCTCTACCGAGAGTTCGATGAGATGGAAGAGTGCTCTCAGCACTTA 
GGACTGTCCCTTCAGGAGATGGCTCTCAAGCTACTCTACCTTCTCACGAGAGTCGTGAAT 

A 

1385 DRDl, 

ProTyrlleGluGlnGlyMetMetLeuAlaGluGlnPheLysGlnLysAlaLeuGlyLeu 
14 4 2 CCGTACATCGAGCAAGGGATGATGCTCGCCGAGCAGTT€AAGCAGAAGGCCCTCGGCCTC 
GGCATGTAGCTCGTTCCCTACTACGAGCGGCTCGTCAAGTTCGTCTTCCGGGAGCCGGAG 

LeuGlnThrAlaSerArgGlnAlaGluVallieAlaProAlaValGlnThrAsnTrpGln 
1502 CTGCAGACCGCGTCCCGTCAGGCAGAGGTTATCGCCCCTGCTGTCCAGACCAACTGGCAA 
GACGTCTGGCGCAGGGCAGTCCGTCTCCAATAGCGGGGACGACAGGTCTGGTTGACCGTT 

1502 PSTI, 1507 TTH3I, 

LysLeuGluThrPheTrpAlaLysHisMetTrpAsnPhelleSerGlylleGlnTyrLeu 
. * 1562 AAACTCGAGACCTTCTGGGCGAAGCATATGTGGAACTTCATCAGTGGGATACAATACTTG 
TTTGAGCTCTGGAAGACCCGCTTCGTATACACCTTGAAGTAGTCACCCTATGTTATGAAC 

A ^ 

1565 XHOI, 1586 NDEI, 

AlaGlyLeuSerThrLeuProGlyAsnProAlalleAlaSerLeuMetAlaPheThrAla 
1622 GCGGGCTTGTCAACGCTGCCTGGTAACCCCGCCATTGCTTCATTGATGGCTTTTACAGCT 
CGCCCGAACAGTTGCGACGGACCATTGGGGCGGTAACGAAGTAACTACCGAAAATGTCGA 

^ A 

1643 BSTE2, 1677 ALWNl PVU2, 

AlaValThrSerProLeuThrThrSerGinThrLeuLeuPheAsnlleLeuGlyGlyTrp 
1682 GCTGTCACCAGCCCACTAACCACTAGCCAAACCCTCCTCTTCAACATATTGGGGGGGTGG 
CGACAGTGGTCGGGTGATTGGTGATCGGTTTGGGAGGAGAAGTTGTATAACCCCCCCACC 
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ValAlaAlaGlnLeuAlaAlaProGlyAlaAlaThrAlaPheValGlyAlaGlyLeuAla 
174 2 GTGGCTGCCCAGCTCGCCGCCCCCGGTGCCGCTACTGCCTTTGTGGGCGCTGGCTTAGCT 
CACCGACGGGTCGAGCGGCGGGGGCCACGGCGATGACGGAAACACCCGCGACCGAATCGA 

,. 1794 ESPl, 

GlyAlaAlalleGlySerValGlyLeuGlyLysValLeuIleAspIleLeuAlaGlyTyr 
1802 GGCGCCGCCATCGGCAGTGTTGGACTGGGGAAGGTCCTCATAGACATCCTTGCAGGGTAT 
CCGCGGCGGTAGCCGTCACAACCTGACCCCTTCCAGGAGTATCTGTAGGAACGTCCCATA 

1802 KASl NARI, 

GlyAlaGlyValAiaGlyAlaLeuValAlaPheLysIleMetSerGlyGluValProSer 
1862 GGCGCGGGCGTGGCGGGAGCTCTTGTGGCATTCAAGATCATGAGCGGTGAGGTCCCCTCC 
CCGCGCCCGCACCGCCCTCGAGAACACCGTAAGTTCTAGTACTCGCCACTCCAGGGGAGG 

1878 SACI, 1899 BSPHl, 

ThrGluAspLeuValAsnLeuLeuProAlalleLeuSerProGlyAlaLeuValValGly 
1922 ACGGAGGACCTGGTCAATCTACTGCCCGCCATCCTCTCGCCCGGAGCCCTCGTAGTCGGC 
TGCCTCCTGGACCAGTTAGATGACGGGCGGTAGGAGAGCGGGCCTCGGGAGCATCAGCCG 

1928 TTH3I, 

ValValCysAlaAlalleLeuArgArgHisValGlyProGlyGluGlyAlaValGlnTrp 
1982 GTGGTCTGTGCAGCAATACTGCGCCGGCACGTTGGCCCGGGCGAGGGGGCAGTGCAGTGG 
CACCAGACACGTCGTTATGACGCGGCCGTGCAACCGGGCCCGCTCCCCCGTCACGTCACC 

2004 NAEI, 2017 SMAI XMAI, 

MetAsnArgLeuIleAlaPheAlaSerArgGlyAsnHisValSerProThrHisTyrVal 
2042 ATGAACCGGCTGATAGCCTTCGCCTCCCGGGGGAACCATGTTTCCCCCACGCACTACGTG 
TACTTGGCCGACTATCGGAAGCGGAGGGCCCCCTTGGTACAAAGGGGGTGCGTGATGCAC 

2067 SMAI XMAI, 2093 DRA3, 

ProGluSerAspAlaAlaAlaArgValThrAlalleLeuSerSerLeuThrValThrGIn 
2102 CCGGAGAGCGATGCAGCTGCCCGCGTCACTGCCATACTCAGCAGCCTCACTGTAACCCAG 
GGCCTCTCGCTACGTCGACGGGCGCAGTGACGGTATGAGTCGTCGGAGTGACATTGGGTC 

2115 PVU2, 2159 ALWNl, 

LeuLeuArgArgLeuHisGlnTrpIleSerSerGluCysThrThrProCysSerGlySer 
2162 CTCCTGAGGCGACTGCACCAGTGGATAAGCTCGGAGTGTACCACTCCATGCTCCGGTTCC 
GAGGACTCCGCTGACGTGGTCACCTATTCGAGCCTCACATGGTGAGGTACGAGGCCAAGG 

2164 MST2, 2220 ECONl, 

TrpLeuArgAspIleTrpAspTrpIleCysGluValLeuSerAspPheLysThrTrpLeu 
2222 TGGCTAAGGGACATCTGGGACTGGATATGCGAGGTGTTGAGCGACTTTAAGACCTGGCTA 
ACCGATTCCCTGTAGACCCTGACCTATACGCTCCACAACTCGCTGAAATTCTGGACCGAT 

LysAlaLysLeuMetProGlnLeuProGlylieProPheValSerCysGlnArgGlyTyr 
2282 AAAGCTAAGCTCATGCCACAGCTGCCTGGGATCCCCTTTGTGTCCTGCCAGCGCGGGT AT 
TTTCGATTCGAGTACGGTGTCGACGGACCCTAGGGGAAACACAGGACGGTCGCGCCCATA 

2285 ESPl, 2300 PVU2, 2310 BAMHI, 
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LysGlyValTrpArgGlyAspGlylleM tHisThrArgCysHisCysGlyAlaGluIie 
2342 AAGGGGGTCTGGCGAGGGGACGGCATCATGCACACTCGCTGCCACTGTGGAGCTGAGATC 
TTCCCCCAGACCGCTCCCCTGCCGTAGTACGTGTGAGCGACGGTGACACCTCGACTCTAG 

ThrGlyHisValLysAsnGlyThrMetArglleValGlyProArgThrCysArgAsnMet 
2402 ACtGGACATGTCAAAAACGGGACGATGAGGATCGTCGGTCCTAGGACCTGCAGGAACATG 
TGACCTGTACAGTTTTTGCCCT6CTACTCCTAGCAGCCAGGATCCTGGACGTCCTTGTAC 

/\ AAA 

2425 BSABl, 2441 AVR2, 2448 SSE83871, 2449 PSTI, 

TrpSerGlyThrPheProIleAsnAlaTyrThrThrGlyProCysThrProLeuProAla 
2462 TGGAGTGGGACCTTCCCCATTAATGCCTACACCACGGGCCCCTGTACCCCCCTTCCTGCG 
ACCTCACCCTGGAAGGGGTAATTACGGATGTGGTGCCCGGGGACATGGGGGGAAGGACGC 

2480 ASEl, 2497 APAI, 

ProAsnTyrThrPheAlaLeuTrpArgValSerAlaGluGluTyrValGluIleArgGln 
2522 CCGAACTACACGTTCGCGCTATGGAGGGTGTCTGCAGAGGAATACGTGGAGATAAGGCAG 
GGCTTGATGTGCAAGCGCGATACCTCCCACAGACGTCTCCTTATGCACCTCTATTCCGTC 

2553 PSTI, 

ValGlyAspPheHisTyrValThrGlyMetThrThrAspAsnLeuLysCysProCysGln 
2582 GTGGGGGACTTCCACTACGTGACGGGTATGACTACTGACAATCTTAAATGCCCGTGCCAG 
CACCCCCTGAAGGTGATGCACTGCCCATACTGATGACTGTTAGAATTTACGGGCACGGTC 

2594 DRA3, 

ValProSerProGluPhePheThrGluLeuAspGlyValArgLeuHisArgPheAlaPro 
264 2 GTCCC ATCGCCCGAATTTTTC ACAGAATTGGACGGGGTGCGCCTACATAGGTTTGCGCCC 
CAGGGTAGCGGGCTTAAAAAGTGTCTTAACCTGCCCCACGCGGATGTATCCAAACGCGGG 

ProCysLysProLeuLeuArgGluGluValSerPheArgValGlyLeuHisGluTyrPro 
2702 CCCTGCAAGCCCTTGCTGCGGGAGGAGGTATCATTCAGAGTAGGACTCCACGAATACCCG 
GGGACGTTCGGGAACGACGCCCTCCTCCATAGTAAGTCTCATCCTGAGGTGCTTATGGGC 

2757 HGIE2, 

ValGlySerGlnLeuProCysGluProGluProAspValAlaValLeuThrSerMetLeu 
27 62 GTAGGGTCGCAATTACCTTGCGAGCCCGAACCGGACGTGGCCGTGTTGACGTCCATGCTC 
CATCCCAGCGTTAATGGAACGCTCGGGCTTGGCCTGCACCGGCACAACTGCAGGTACGAG 

A 

2809 AAT2, 

ThrAspProSerHisIleThrAlaGluAlaAlaGlyArgArgLeuAiaArgGlySerPro 
2822 ACTGATCCCTCCCATATAACAGCAGAGGCGGCCGGGCGAAGGTTGGCGAGGGGATCACCC 
TGACTAGGGAGGGTATATTGTCGTCTCCGCCGGCCCGC'TTCCAACCGCTCCCCTAGTGGG 

A 

2850 EAGl XMA3, 

ProSerValAlaSerSerSerAlaSerGlnLeuSerAlaProSerLeuLysAlaThrCys 
2882 CCCTCTGTGGCCAGCTCCTCGGCTAGCCAGCTATCCGCTCCATCTCTCAAGGCAACTTGC 
GGGAGACACCGGTCGAGGAGCCGATCGGTCGATAGGCGAGGTAGAGAGTTCCGTTGAACG 

A A 

2889 BALI, 2903 NHEI, 
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ThrAlaAsnHisAspSerProAspAlaGluLeuIleGluAlaAsnLeuLeuTroArgGin 
2942 ACCGCTAACCATGACTCCCCTGATGCTGAGCTCATAGAGGCCAACCTCCTATGGAGGCAG 
TGGCGATTGGTACTGAGGGGACTACGACTCGAGTATCTCCGGTTGGAGGATACCTCCGTC 

A A 

2966 ESPl, 2969 SACI, 

GluMetGlyGlyAsnlleThrArgValGluSerGluAsnLysValVallleLeuAspSer 
3002 GAGATGGGCGGCAACATCACCAGGGTTGAGTCAGAAAACAAAGTGGTGATTCTGGACTCC 
CTCTACCCGCCGTTGTAGTGGTCCCAACTCAGTCTTTTGTTTCACCACTAAGACCTGAGG 

PheAspProLeuValAlaGluGluAspGluArgGluIleSerValProAlaGluIleLeu 
3062 TTCGATCCGCTTGTGGCGGAGGAGGACGAGCGGGAGATCTCCGTACCCGCAGAAATCC7G 
AAGCTAGGCGAACACCGCCTCCTCCTGCTCGCCCTCTAGAGGCATGGGCGTCTTTAGGAC 

3096 BGL2, 

ArgLysSerArgArgPheAlaGlnAlaLeuProValTrpAlaArgProAspTyrAsnPro 
3122 CGGAAGTCTCGGAGATTCGCCCAGGCCCTGCCCGTTTGGGCGCGGCCGGACTATAACCCC 
GCCTTCAGAGCCTCTAAGCGGGTCCGGGACGGGCAAACCCGCGCCGGCCTGATATTGGGG 

3143 ALWNl, 3164 EAGl XMA3, 

ProLeuValGluThrTrpLysLysProAspTyrGluProProValValHisGlyCysPro 
3182 CCGCTAGTGGAGACGTGGAAAAAGCCCGACTACGAACCACCTGTGGTCCATGGCTGCCCG 
GGCGATCACCTCTGCACCTTTTTCGGGCTGATGCTTGGTGGACACCAGGTACCGACGGGC 

3217 HGIE2, 3229 NCOI, 

LeuProProProLysSerProProValProProProArgLysLysArgThrValValLeu 
324 2 CTTCCACCTCCAAAGTCCCCTCCTGTGCCTCCGCCTCGGAAGAAGCGGACGGTGGTCCTC 
GAAGGTGGAGGTTTCAGGGGAGGACACGGAGGCGGAGCCTTCTTCGCCTGCCACCAGGAG 

ThrGluSerThrLeuSerThrAl-aLeuAlaGluLeuAlaThrArgSerPheGlySerSer 
3302 ACTGAATCAACCCTATCTACTGCCTTGGCCGAGCTCGCCACCAGAAGCTTTGGCAGCTCC 
TGACTTAGTTGGGATAGATGACGGAACCGGCTCGAGCGGTGGTCTTCGAAACCGTCGAGG 

3332 SACI, 334 6 HIND3, 

- SerThrSerGlylleThrGlyAspAsnXhrThrThrSerSerGluProAlaProSerGly 
3362 TCAACTTCCGGCATTACGGGCGACAATACGACAACATCCTCTGAGCCCGCCCCTTCTGGC 
AGTTGAAGGCCGTAATGCCCGCTGTTATGCTGTTGTAGGAGACTCGGGCGGGGAAGACCG 

CysProProAspSerAspAiaGluSerTyrSerSerMetProProLeuGluGlyGluPro 
3422 TGCCCCCCCGACTCCGACGCTGAGTCCTATTCCTCCATGCCCCCCCTGGAGGGGGAGCCT 
ACGGGGGGGCTGAGGCTGCGACTCAGGATAAGGAGGTACGGGGGGGACCTCCCCCTCGGA 

3437 EAM11051, 

GlyAspProAspLeuSerAspGlySerTrpSerThrValSerSerGluAlaAsnAlaGlu 
34 82 GGGGATCCGGATCTTAGCGACGGGTCATGGTCAACGGTCAGTAGTGAGGCCAACGCGGAG 
CCCCTAGGCCTAGAATCGCTGCCCAGTACCAGTTGCCAGTCATCACTCCGGTTGCGCCTC 

3484 BAMHI, 3485 BSABl, 3487 BSPEl, 

AspValValCysCysSerMetSerTyrSerTrpThrGlyAlaLeuValThrProCysAla 
3542 GATGTCGTGTGCTGCTCAATGTCTTACTCTTGGACAGGCGCACTCGTCACCCCGTGCGCC 
CTACAGCACACGACGAGTTACAGAATGAGAACCTGTCCGCGTGAGCAGTGGG6CACGCGG 
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3589 DRA3, 3600 SAC2, ' 

AlaGluGluGlnLysLeuProIleAsnAlaLeuSerAsnSerLeuLeuArgHisHisAsn 
3602.. GCGGAAGAACAGAAACTGCCCATCAATGCACTAAGCAACTCGTTGCTACGTCACCACAAT 
CGCCTTCTTGTCTTTGACGGGTAGTTACGTGATTCGTTGAGCAACGATGCAGTGGTGTTA 

A A 

3611 ALWNl, 3655 PFLMl, 

LeuValTyrSerThrThrSerArgSerAlaCysGlnArgGlnLysLysValThrPheAsp 
3662 TTGGTGTATTCCACCACCTCACGCAGTGCTTGCCAAAGGCAGAAGAAAGTCACATTTGAC 
AACCACATAAGGTGGTGGAGTGCGTCACGAACGGTTTCCGTCTTCTTTCAGTGTAAACTG 

3681 DE^3, 

ArgLeuGlnValLeuAspSerHisTyrGlnAspValLeuLysGluValLysAlaAlaAia 
3722 AGACTGCAAGTTCTGGACAGCCATTACCAGGACGTACTCAAGGAGGTTAAAGCAGCGGCG 
TCTGACGTTCAAGACCTGTCGGTAATGGTCCTGCATGAGTTCCTCCAATTTCGTCGCCGC 

SerLysValLysAlaAsnLeuLeuSerValGluGluAlaCysSerLeuThrProProHis 
3782 TCAAAAGTGAAGGCTAACTTGCTATCCGTAGAGGAAGCTTGCAGCCTGACGCCCCCACAC 
AGTTTTCACTTCCGATTGAACGATAGGCATCTCCTTCGAACGTCGGACTGCGGGGGTGTG 

3816 HIND3, 

SerAlaLysSerLysPheGlyTyrGlyAlaLysAspValArgCysHisAlaArgLysAla 
384 2 TCAGCCAAATCCAAGTTTGGTTATGGGGCAAAAGACGTCCGTTGCCATGCCAGAAAGGCC 
AGTCGGTTTAGGTTCAAACCAATACCCCGTTTTCTGCAGGCAACGGTACGGTCTTTCCGG 

3875 AAT2, 3890 BGLI, 

ValThrHisIleAsnSerValTrpLysAspLeuLeuGluAspAsnValThrProIleAsp 
3902 GTAACCCACATCAACTCCGTGTGGAAAGACCTTCTGGAAGACAATGTAACACCAATAGAC 
CATTGGGTGTAGTTGAGGCACACCTTTCTGGAAGACCTTCTGTTACATTGTGGTTATCTG 

ThrThrlleMetAlaLysAsnGluValPheCysValGlnProGluLysGlyGlyArgLys 
3962 ACTACCATCATGGCTAAGAACGAGGTTTTCTGCGTTCAGCCTGAGAAGGGGGGTCGTAAG 
TGATGGTAGTACCGATTCTTGCTCCAAAAGACGCAAGTCGGACTCTTCCCCCCAGCATTC 

ProAlaArgLeuIleValPheProAspLeuGlyValArgValCysGluLysMetAlaLeu 
4022 CCAGCTCGTCTCATCGTGTTCCCCGATCTGGGCGTGCGCGTGTGCGAAAAGATGGCTTTG 
GGTCGAGCAGAGTAGCACAAGGGGCTAGACCCGCACGCGCACACGCTTTTCTACCGAAAC 

TyrAspValValThrLysLeuProLeuAlaValMetGlySerSerTyrGlyPheGlnTyr 
4082 TACGACGTGGTTACAAAGCTCCCCTTGGCCGTGATGGGAAGCTCCTACGGATTCCAATAC 
ATGCTGCACCAATGTTTCGAGGGGAACCGGCACTACCCTTCGAGGATGCCTAAGGTTATG 

SerProGlyGlnArgValGluPheLeuValGlnAlaTrpLysSerLysLysThrProMet 
4142 TCACCAGGACAGCGGGTTGAATTCCTCGTGCAAGCGTGGAAGTCCAAGAAAACCCCAATG 
AGTGGTCCTGTCGCCCAACTTAAGGAGCACGTTCGCACCTTCAGGTTCTTTTGGGGTTAC 

A 

4160 ECORI, 

GlyPheSerTyrAspThrArgCysPheAspSerThrValThrGluSerAspIleArgThr 
4202 GGGTTCTCGTATGATACCCGCTGCTTTGACTCCACAGTCACTGAGAGCGACATCCGTACG 
CCCAAGAGCATACTATGGGCGACGAAACTGAGGTGTCAGTGACTCTCGCTGTAGGCATGC 
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4229 DRDl, 4236 ALWNl, 

GluGIuAlalleTyrGlnCysCysAspLeuAspProGlnAlaArgValAlalleLysSer 
4 262 GAGGAGGCAATCTACCAATGTTGTGACCTCGACCCCCAAGCCCGCGTGGCCATCAAGTCC 
CTCCTCCGTTAGATGGTTACAACACTGGAGCTGGGGGTTCGGGCGCACCGGTAGTTCAGG 

As. A 

4301 BGLI, 4308 BALI, 

LeuThrGluArgLeuTyrValGlyGlyProLeuThrAsnSerArgGlyGluAsnCysGly 
4 322 CTCACCGAGAGGCTTTATGTTGGGGGCCCTCTTACCAATTCAAGGGGGGAGAACTGCGGC 
GAGTGGCTCTCCGAAATACAACCCCCGGGAGAATGGTTAAGTTCCCCCCTCTTGACGCCG 

4345 APAI, 

TyrArgArgCysArgAlaSerGlyValLeuThrThrSerCysGlyAsnThrLeuThrCys 
4 382 TATCGCAGGTGCCGCGCGAGCGGCGTACTGACAACTAGCTGTGGTAACACCCTCACTTGC 
ATAGCGTCCACGGCGCGCTCGCCGCATGACTGTTGATCGACACCATTGTGGGAGTGAACG 

TyrlleLysAlaArgAlaAlaCysArgAlaAlaGlyLeuGlnAspCysThrMetLeuVal 

4 4 42 TACATCAAGGCCCGGGCAGCCTGTCGAGCCGCAGGGCTCCAGGACTGCACCATGCTCGTG 

ATGTAGTTCCGGGCCCGTCGGACAGCTCGGCGTCCCGAGGTCCTGACGTGGTACGAGCAC 
/I 

4452 SMAI XMAI, 

CysGlyAspAspLeuValVallleCysGluSerAlaGlyValGlnGluAspAlaAlaSer 
4 502 TGTGGCGACGACTTAGTCGTTATCTGTGAAAGCGCGGGGGTCCAGGAGGACGCGGCGAGC 
ACACCGCTGCTGAATCAGCAATAGACACTTTCGCGCCCCCAGGTCCTCCTGCGCCGCTCG 

4508 DRDl, 4511 TTH3I, 

LeuArgAlaPheThrGluAlaMetThrArgTyrSerAiaProProGlyAspProProGln 
4 562 CTGAGAGCCTTCACGGAGGCTATGACCAGGTACTCCGCCCCCCCTGGGGACCCCCCACAA 
GACTCTCGGAAGTGCCTCCGATACTGGTCCATGAGGCGGGGGGGACCCCTGGGGGGTGTT 

ProGluTyrAspLeuGluLeuIleThrSerCysSerSerAsnValSerValAlaHisAsp 
4 622 CCAGAATACGACTTGGAGCTCATAACATCATGCTCCTCCAACGTGTCAGTCGCCCACGAC 
GGTCTTATGCTGAACCTCGAGTATTGTAGTACGAGGAGGTTGCACAGTCAGCGGGTGCTG 

4 637 SACI, '-^ 

GlyAlaGlyLysArgValTyrTyrLeuThrArgAspProThrThrProLeuAlaArgAla 
4 682 GGCGCTGGAAAGAGGGTCT ACTACCTCACCCGTGACCCTACAACCCCCCTCGCGAGAGCT 
CCGCGACCTTTCTCCCAGATGATGGAGTGGGCACTGGGATGTTGGGGGGAGCGCTCTCGA 

4731 NRUI, 

AlaTrpGluThrAlaArgHisThrProValAsnSerTrpLeuGlyAsnllelieMetPhe 
47 4 2 GCGTGGGAGACAGCAAGACACACTCCAGTCAATTCCTGGCTAGGCAACATAATCATGTTT 
CGCACCCTCTGTCGTTCTGTGTGAGGTCAGTTAAGGACCGATCCGTTGTATTAGTACAAA 

AlaProThrLeuTrpAlaArgMetlleLeuMetThrHisPhePheSerValLeuIleAla 
4802 GCCCCCACACTGTGGGCGAGGATGATACTGATGACCCATTTCTTTAGCGTCCTT ATAGCC 
CGGGGGTGTGACACCCGCTCCTACTATGACTACTGGGTAAAGAAATCGCAGGAATATCGG 

4806 PFLMl, 4807 DRA3, 

ArgAspGlnLeuGluGlnAlaLeuAspCysGluIleTyrGlyAlaCysTyrSerlleGlu 
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4862 agggaccagcttgaacaggccctcgattgcgagatctacggggcctgctactccataga;^. 
tccctggtcgaacttgtccgggagctaacgctctagatgccccggacgatgaggtatctt 

A 

4893 BGL2, 

• ' ProLeuAspLeuProProIlelleGlnArgLeuHisGlyLeuSerAlaPheSerLeuHis 
4922 CCACTGGATCTACCTCCAATCATTCAAAGACTCCATGGCCTCAGCGCATTTTCACTCCAC 
GGTGACCTAGATGGAGGTTAGTAAGTTTCTGAGGTACCGGAGTCGCGTAAAAGTGAGGTG 

A 

4 954 NCOI, 

SerTyrSerProGlyGluIleAsnArgValAlaAlaCysLeuArgLysLeuGlyValPro 
4 982 AGTTACTCTCCAGGTGAAATCAATAGGGTGGCCGCATGCCTCAGAAAACTTGGGGTACCG 
TCAATGAGAGGTCCACTTTAGTTATCCCACCGGCGTACGGAGTCTTTTGAACCCCATGGC 

A A 

5015 SPHI, 5035 KPNI, 

ProLeuArgAlaTrpArgHisArgAlaArgSerValArgAlaArgLeuLeuAlaArgGly 
504 2 CCCTTGCGAGCTTGGAGACACCGGGCCCGGAGCGTCCGCGCTAGGCTTCTGGCCAGAGGA 
GGGAACGCTCGAACCTCTGTGGCCCGGGCCTCGCAGGCGCGATCCGAAGACCGGTCTCCT 

A A 

5064 APAI, 5091 BALI, 

GlyArgAlaAlalleCysGlyLysTyrLeuPheAsnTrpAlaValArgThrLysLeuLys 
5102 GGCAGGGCTGCC AT ATGTGGCAAGT ACCTCTTCAACTGGGCAGT AA.G AACAAAGCTCAAA 
CCGTCCCGACGGTATACACCGTTCATGGAGAAGTTGACCCGTCATTCTTGTTTCGAGTTT 

5113 NDEI, 

LeuThrProIleAlaAlaAlaGiyGlnLeuAspLeuSerGlyTrpPheThrAlaGlyTyr 
5162 CTCACTCCAATAGCGGCCGCTGGCCAGCTGGACTTGTCCGGCTGGTTCACGGCTGGCTAC 
GAGTGAGGTTATCGCCGGCGACCGGTCGACCTGAACAGGCCGACCAAGTGCCGACCGATG 

A A A A 

5174 NOTI, 5175 EAGl XMA3, 5182 BALI, 5186 PVU2. 

SerGlyGlyAspIleTyrHisSerValSerHisAlaArgProArgTrpIleTrpPheCys 
5222 AGCGGGGGAGACATTTATCACAGCGTGTCTCATGCCCGGCCCCGCTGGATCTGGTTTTGC 
TCGCCCCCTCTGTAAATAGTGTCGCACAGAGTACGGGCCGGGGCGACCTAGACCAAAACG 

A 

5240 DRA3, 

LeuLeuLeuLeuAlaAlaGlyValGlylleTyrLeuLeuProAsnArgMetSerThrAsn 
. ^ 5282 CTACTCCTGCTTGCTGCAGGGGTAGGCATCT ACCTCCTCCCCAACCGAATGAGCACGAAT 
GATGAGGACGAACGACGTCCCCATCCGTAGATGGAGGAGGGGTTGGCTTACTCGTGCTTA 

5295 PSTI, 

ProLysProGlnArgLysThrLysArgAsnThrAsnArgArgProGlnAspValLysPhe 
5342 CCTAAACCTCAAAGAAAGACCAAACGTAACACCAACCGGCGGCCGCAGGACGTCAAGTTC 
GGATTTGGAGTTTCTTTCTGGTTTGCATTGTGGTTGGCCGCCGGCGTCCTGCAGTTCAAG 

A A A A 

• 5380 NOTI, 5381 EAGl Xb4A3, 5390 AAT2, 5401 SMAI XMAI, 

ProGlyGlyGlyGlnlleValGlyGlyValTyrLeuLeuProArgArgGlyProArgLeu 
5402 CCGGGTGGCGGTCAGATCGTTGGTGGAGTTTACTTGTTGCCGCGCAGGGGCCCTAGATTG 
GGCCCACCGCCAGTCTAGCAACCACCTCAAATGAACAACGGCGCGTCCCCGGGATCTAAC 
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544 9 APAIr 

GlyValArgAlaThrArgLysThrSerGiuArgSerGlnProArgGiyArgArgGinPro 
54 62 GGTGTGCGCGCGACGAGAAAGACTTCCGAGCGGTCGCAACCTCGAGGTAGACGTCAGCCT 
CCACACGCGCGCTGCTCTTTCTGAAGGCTCGCCAGCGTTGGAGCTCCATCTGCAGTCGGA 

5467 BSSH2, 5478 XMNI, 5502 XHOI, 5511 AAT2, 

IleProLysAlaArgArgProGluGlyArgThrTrpAlaGlnProGlyTyrProTrpPro 
5522 ATCCCCAAGGCTCGTCGGCCCG AGGGCAGGACCTGGGCTCAGCCCGGGTACCCTTGGCCC 
TAGGGGTTCCGAGCAGCCGGGCTCCCGTCCTGGACCCGAGTCGGGCCCATGGGAACCGGG 

5548 ALWNl, 5558 ESPl, 5564 SMAI XMAI, 5568 KPNI, 

LeuTyrGlyAsnGluGlyCysGlyTrpAlaGlyTrpLeuLeuSerProArgGlySerArg 
5582 CTCTATGGCAATGAGGGCTGCGGGTGGGCGGGATGGCTCCTGTCTCCCCGTGGCTCTCGG 
GAGATACCGTTACTCCCGACGCCCACCCGCCCTACCGAGGACAGAGGGGCACCGAGAGCC 

ProSerTrpGlyProThrAspProArgArgArgSerArgAsnLeuGlyLysVallleAsp 
5642 CCTAGCTGGGGCCCCACAGACCCCCGGCGTAGGTCGCGCAATTTGGGTAAGGTCATCGAT 
GGATCGACCCCGGGGTGTCTGGGGGCCGCATCCAGCGCGTTAAACCCATTCCAGTAGCTA 

•;•* 

5650 APAI, 5696 CLAI, 

ThrLeuThrCysGlyPheAlaAspLeuMetGlyTyrlleProLeuValOC AM 
5702 ACCCTTACGTGCGGCTTCGCCGACCTCATGGGGTACATACCGCTCGTCTAATAGTCGAC 
TGGGAATGCACGCCGAAGCGGCTGGAGTACCCCATGTATGGCGAGCAGATTATCAGCTG 

5724 HGIE2, 5755 SALI, 
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MetAlaAlaTyrAlaAlaGinGlyTyrLysValLeuValLeuAsn 
2 AGCTTACAAAACAAAATGGCTGCATATGCAGCTCAGGGCTATAAGGTGCTAGTACTCAAC 
TCGAATGTTTTGTTTTACCGACGTATACGTCGAGTCCCGATATTCCACGATCATGAGTTG 

1 H:ND3, 24 NDEI, 52 SCAI, 

ProSerValAlaAlaThrLeuGlyPheGlyAlaTyrMetSerLysAlaHisGlylleAsp 
62 CCCTCTGTTGCTGCAACACTGGGCTTTGGTGCTTACATGTCCAAGGCTCATGGGATCGAT 
GGGAGACAACGACGTTGTGACCCGAAACCACGAATGTACAGGTTCCGAGTACCCTAGCTA 

116 CLAI, 

ProAsnlleArgThrGlyValArgThrlieThrThrGlySerProileThrTyrSerThr 
122 CCTAACATCAGGACCGGGGTGAGAACAAT7ACCACTGGC AGCCCCATCACGTACTCCACC 
GGATTGTAGTCCTGGCCCCACTCTTGTTAATGGTGACCGTCGGGGTAGTGGATGAGGTGG 

TyrGlyLysPheLeuAiaAspGlyGlyCysSerGlyGlyAlaTyrAspIIellelleCys 
132 TACGGCAAGTTCCTTGCCGACGGCGGGTGCTCGGGGGGCGCTTATGACATAATAATTTGT 
ATGCCGTTCAAGGAACGGCTGCCGCCCACGAGCCCCCCGCGAATACTGTATTATTAAACA 

AspGluCysHisSerThrAspAiaThrSerlleLeuGlylleGlyThrValLeuAspGin 
2 4 2 GACGAGTGCCACTCCACGGATGCCACATCCATCTTGGGCATTGGCACTGTCCTTGACCAA 
CTGCTCACGGTGAGGTGCCTACGGTGTAGGTAGAACCCGTAACCGTGACAGGAACTGG7T 

AiaGIuThrAiaGLyAIaArgLeuValValLeuAiaThrAlaThrProProGiySerVai 
302 GC AGAG ACTGCGGGGGCGAGACTGGTTGTGCTCGCCACCGCC ACCCCTCCGGGCTCCGTC 
CGTCTCTGACGCCCCCGCTCTGACCAACACGAGCGGTGGCGGTGGGGAGGCCCGAGGCAG 

303 ALWNl, 

ThrVaiProHisProAsnlleGluGluValAlaLeuSerThrThrGlyGluIieProPhe 
362 ACTGTGCCCCATCCCAACATCGAGGAGGTTGCTCTGTCCACCACCGGAGAGATCCCTTTT 
TGACACGGGGTAGGGTTGTAGCTCCTCCAACGAGACAGGTGGTGGCCTCTCTAGGGAAAA 

TyrGiyLysAlalleProLeuGluVaLIlcLysGlyGiyArgHisLeuIlePheCysHis 
4 2 2 TACGGCAAGGCTATCCCCCTCGAAGTAATCAAGGGGGGGAGAC ATCTCATCTTCTGTCAT 
ATGCCGTTCCGATAGGGGGAGCTTCATTAGTTCCCCCCCTCTGTAGAGTAGAAGACAGTA 
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SerLysLysLysCysAspGluLeuAlaAlaLysLeuValAlaLeuGlylleAsnAlaVal 
482 TCAAAGAAGAAGTGCGACGAACTCGCCGCAAAGCTGGTCGCATTGGGCATCAATGCCGTG 
AGTTTCTTCTTCACGCTGCTTGAGCGGCGTTTCGACCAGCGTAACCCGTAGTTACGGCAC 

AlaTyrTyrArgGlyLeuAspValSerVallleProThrSerGlyAspValValVaiVal 
542 GCCTACTACCGCGGTCTTGACGTGTCCGTCATCCCGACCAGCGGCGATGTTGTCGTCGTG 
CGGATGATGGCGCCAGAACTGCACAGGCAGTAGGGCTGGTCGCCGCTACAACAGCAGCAC 

550 SAC2, 560 DRDl, 

AlaThrAspAlaLeuMetThrGlyTyrThrGlyAspPheAspSerVallleAspCysAsn 
602 GCAACCGATGCCCTCATGACCGGCTATACCGGCGACTTCGACTCGGTGATAGACTGCAAT 
CGT.TGGCTACGGGAGTACTGGCCGATATGGCCGCTGAAGCTGAGCCACTATCTGACGTTA 

615 BSPHl, 

ThrCysValThrGlnThrValAspPheSerLeuAspProThrPheThrlleGluThrlle 
662 ACGTGTGTCACCCAGACAGTCGATTTCAGCCTTGACCCTACCTTCACCATTGAGACAATC 
TGCACACAGTGGGTCTGTCAGCTAAAGTCGGAACTGGGATGGAAGTGGTAACTCTGTTAG 

ThrLeuProGlnAspAlaValSerArgThrGlnArgArgGlyArgThrGlyArgGlyLys 
722 ACGCTCCCCCAAGATGCTGTCTCCCGCACTCAACGTCGGGGCAGGACTGGCAGGGGGAAG 
TGCGAGGGGGTTCTACGACAGAGGGCGTGAGTTGCAGCCCCGTCCTGACCGTCCCCCTTC 

ProGlylleTyrArgPheValAlaProGlyGluArgProSerGlyMetPheAspSerSer 
782 CCAGGCATCT ACAGATTTGTGGCACCGGGGGAGCGCCCCTCCGGCATGTTCGACTCGT CC 
GGTCCGTAGATGTCTAAACACCGTGGCCCCCTCGCGGGGAGGCCGTACAAGCTGAGCAGG 

816 BGLI, 833 DRDl, 

ValLeuCysGluCysTyrAspAlaGlyCysAlaTrpTyrGluLeuThrProAlaGluThr 
842 GTCCTCTGTGAGTGCTATGACGCAGGCTGTGCTTGGTATGAGCTCACGCCCGCCGAGACT 
CAGGAGACACTCACGATACTGCGTCCGACACGAACCATACTCGAGTGCGGGCGGCTCTGA 

A 

881 SACI, 

ThrValArgLeuArgAiaTyrMetAsnThrProGlyLeuProValCysGlnAspHisLeu 
902 ACAGTTAGGCTACGAGCGTACATGAACACCCCGGGGCTTCCCGTGTGCCAGGACCATCTT 
TGTCAAfCCGATGCTCGCATGTACTTGTGGGGCCCCGAAGGGCACACGGTCCTGGTAGAA 

A 

931 SMAI XMAI, 

G.luPheTrpGluGlyValPheThrGlyLeuThrHisXleAspAlaHisPheLeuSerGln 
962 GAATTTTGGGAGGGCGTCTTTACAGGCCTCACTCATATAGATGCCCACTTTCTATCCCAG 
CTTAAAACCCTCCCGCAGAAATGTCCGGAGTGAGTATATCTACGGGTGAAAGATAGGGTC 

A 

985 STUI, 

ThrLysGlnSerGlyGluAsnLeuProTyrLeuValAlaTyrGlnAlaThrValCysAla 
1022 ACAAAGCAGAGTGGGGAGAACCTTCCTTACCTGGTAGCGTACCAAGCCACCGTGTGCGCT 
TGTTTCGTCTCACCCCTCTTGGAAGGAATGGACCATCGCATGGTTCGGTGGCACACGCGA 

A 

1069 DRA3, 

ArgAlaGlnAlaProProProSerTrpAspGlnMetTrpLysCysLeuIleArgLeuLys 
1082 AGGGCTCAAGCCCCTCCCCCATCGTGGGACCAGATGTGGAAGTGTTTGATTCGCCTCAAG 
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TCCCGAGTTCGGGGAGGGGGTAGCACCCTGGTCTACACCTTCACAAACTAAGCGGAGTTC 

ProThrLeuHisGlyProThrl>roLeuLeuTyrArgLeuGlyAlaValGlnAsnGluIie 
1142 CCCACCCTCCATGGGCCAACACCCCTGCTATACAGACTGGGCGCTGTTCAGAATGAAATC 
GGGTGGGAGGTACCCGGTTGTGGGGACGATATGTCTGACCCGCGACAAGTCTTACTTTAG 

1150 NCOI, 

ThrLeuThrHisProValThrLysTyrlleMetThrCysMetSerAlaAspLeuGiuVal 
1202 ACCCTGACGCACCCAGTC ACCAAATACATCATGACATGCATGTCGGCCGACCTGGAGGTC 
TGGGACTGCGTGGGTCAGTGGTTTATGTAGTACTGTACGTACAGCCGGCTGGACCTCCAG 

AAA A ^ 

1230 BSPHl, 1234 DRDl, 1237 AVA3, 1245 EAGl XMA3, 1250 DRDl, 



ValThrSerThrTrpValLeuValGlyGlyValLeuAlaAlaLeuAlaAiaTyrCysLeu 
1262 GTCACGAGCACCTGGGTGCTCGTTGGCGGCGTCCTGGCTGCTTTGGCCGCGTATTGCCTG 
CAGTGCTCGTGGACCCACGAGCAACCGCCGCAGGACCGACGAAACCGGCGCATAACGGAC 

SerThrGlyCysValVallleValGlyArgValValLeuSerGlyLysProAlallelle 
1322 TCAACAGGCTGCGTGGTCATAGTGGGCAGGGTCGTCTTGTCCGGGAAGCCGGCAATCATA 
AGTTGTCCGACGCACCAGTATCACCCGTCCCAGCAGAACAGGCCCTTCGGCCGTTAGTAT 

A 

1369 NAEI, 

ProAspArgGluValLeuTyrArgGluPheAspGluMetGluGluCysSerGlnHisLeu 
1382 CCTGACAGGGAAGTCCTCTACCGAGAGTTCGATGAGATGGAAGAGTGCTCTCAGCACTTA 
GGACTGTCCCTTCAGGAGATGGCTCTCAAGCTACTCTACCTTCTCACGAGAGTCGTGAAT 

Ay 

1385 DRDl, 

ProTyrlleGiuGlnGlyMetMetLeuAlaGluGlnPheLysGlnLysAlaLeuGlyLeu 
14 4 2 CCGTACATCGAGCAAGGGATGATGCTCGCCGAGCAGTTCAAGCAGAAGGCCCTCGGCCTC 
GGCATGTAGCTCGTTCCCTACTACGAGCGGCTCGTCAAGTTCGTCTTCCGGGAGCCGGAG 

LeuGlnThrAlaSerArgGlnAlaGluVallieAlaProAlaValGlnThrAsnTrpGln 
1502 CTGCAGACCGCGTCCCGTCAGGCAGAGGTTATCGCCCCTGCTGTCCAGACCAACTGGCAA 
GACGTCTGGCGCAGGGCAGTCCGTCTCCAATAGCGGGGACGACAGGTCTGGTTGACCGTT 

A A 

1502 PSTI, 1507 TTH3I, 

LysLeuGluThrPheTrpAlaLysHisMetTrpAsnPhelieSerGiylleGlnTyrLeu 
1562 AAACTCGAGACCTTCTGGGCGAAGCATATGTGGAACTTCATCAGTGGGATACAATACTTG 
TTTGAGCTCTGGAAGACCCGCTTCGTATACACCTTGAAGTAGTCACCCTATGTTATGAAC 

A A 

1565 XHOI, 1586 NDEI, 

AlaGlyLeuSerThrLeuProGlyAsnProAlalleAlaSerLeuMetAlaPheThrAla 
1622 GCGGGCTTGTCAACGCTGCCTGGTAACCCCGCCATTGCTTCATTGATGGCTTTTACAGCT 
CGCCCGAACAGTTGCGACGGACCATTGGGGCGGTAACGAAGTAACTACCGAAAATGTCGA 

A ^ 

164 3 BSTE2, 1677 ALWNl PVU2, 

AlaValThrSerProLeuThrThrSerGlnThrLeuLeuPheAsnlleLeuGlyGlyTrp 
1682 GCTGTCACCAGCCCACTAACCACTAGCCAAACCCTCCTCTTCAACATATTGGGGGGGTGG 
CGACAGTGGTCGGGTGATTGGTGATCGGTTTGGGAGGAGAAGTTGTATAACCCCCCCACC 
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ValAlaAlaGlnLeuAlaAlaProGlyAlaAlaThrAlaPheValGlyAlaGlyLeuAla 
1742 GTGGCTGCCCAGCTCGCCGCCCCCGGTGCCGCTACTGCCTTTGTGGGCGCTGGCTTAGCT 
CACCGACGGGTCGAGCGGCGGGGGCCACGGCGATGACGGAAACACCCGCGACCGAATCGA 

'1794 ESPl, 

GlyAlaAlalleGlySerValGlyLeuGlyLysValLeuIleAspIleLeuAlaGlyTyr 
1802 GGCGCCGCCATCGGCAGTGTTGGACTGGGGAAGGTCCTCATAGACATCCTTGCAGGGTAT 
CCGCGGCGGTAGCCGTCACAACCTGACCCCTTCCAGGAGTATCTGTAGGAACGTCCCATA 

1802 KASl NARI, 

GlyAlaGlyVaiAiaGlyAlaLeuValAlaPheLysIleMetSerGiyGluValProSer 
1862 GGGGCGGGCGTGGCGGGAGCTCTTGTGGCATTCAAGATCATGAGGGGTGAGGTCCCCTCC 
CCGCGCCCGCACCGCCCTCGAGAACACCGTAAGTTCTAGTACTCGCCACTCCAGGGGAGG 

1878 SACI, 1899 BSPHl, - 

ThrGluAspLeuValAsnLeuLeuProAlalleLeuSerProGlyAlaLeuValValGly 
1922 ACGGAGGACCTGGTCAATCT ACTGCCCGCCATCCTCTCGCCCGGAGCCCTCGTAGTCGGC 
TGCCTCCTGGACCAGTTAGATGACGGGCGGTAGGAGAGCGGGCCTCGGGAGCATCAGCCG 

1928 TTH3I, 

ValValCysAlaAlalleLeuArgArgHisValGlyProGlyGluGlyAlaValGlnTrp 
1982 GTGGTCTGTGCAGCAATACTGCGCCGGCACGTTGGCCCGGGCGAGGGGGCAGTGCAGTGG 
CACCAGACACGTCGTTATGACGCGGCCGTGCAACCGGGCCCGCTCCCCCGTCACGTCACC 

2004 NAEI, 2017 SMAI XMAI, 

MetAsnArgLeuIleAlaPheAlaSerArgGlyAsnHisValSerProThrHisTyrVal 
2042 ATGAACCGGCTGATAGCCTTCGCCTCCCGGGGGAACC ATGTTTCCCCCACGCACTACGTG 
TACTTGGCCGACTATCGGAAGCGGAGGGCCCCCTTGGTACAAAGGGGGTGCGTGATGCAC 

2067 SMAI Xt4AI, 2093 DRA3, 

ProGluSerAspAlaAiaAlaArgValThrAlalleLeuSerSerLeuThrValThrGln 
2102 CCGGAG AGCGATGCAGCTGCCCGCGTCACTGCC AT ACTC AGC AGCCTCACTGTAACCC AG 
GGCCTCTCGCTACGTCGACGGGCGCAGTGACGGTATGAGTCGTCGGAGTGACATTGGGTC 

2115 PVU2, 2159 ALWNl, 

LeuLeuArgArgLeuHisGlnTrpIleSerSerGluCysThrThrProCysSerGlySer 
2162 CTCCTGAGGCGACTGCACCAGTGGATAAGCTCGGAGTGTACCACTCCATGCTCCGGTTCC 
GAGGACTCCGCTGACGTGGTCACCTATTCGAGCCTCACATGGTGAGGTACGAGGCCAAGG 

A 

2164 MST2, 2220 ECONl, 

TrpLeuArgAspIleTrpAspTrpIleCysGluValLeuSerAspPheLysThrTrpLeu 
2222 TGGCTAAGGGACATCTGGGACTGGATATGCGAGGTGTTGAGCGACTTTAAGACCTGGCTA 
. ACCGATTCCCTGTAGACCCTGACCTATACGCTCCACAACTCGCTGAAATTCTGGACCGAT 

LysAlaLysLeuMetProGlnLeuProGlylleProPheValSerCysGlnArgGlyTyr 
2282 AAAGCT AAGCTC ATGCCACAGCTGCCTGGGATCCCCTTTGTGTCCTGCCAGCGCGGGTAT 
TTTCGATTCGAGTACGGTGTCGACGGACCCTAGGGGAAACACAGGACGGTCGCGCCCATA 

A A 

2285 ESPl, 2300 PVU2, 2310 BAMHI, 
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LysGlyValTrpArgGlyAspGlylleMetHisThrArgCysHisCysGlyAlaGluIle 
2342 AAGGGGGTCTGGCGAGGGGACGGCATCATGCACACTCGCTGCCACTGTGGAGCTGAGATC 
TTCCCCCAGACCGCTCCCCTGCCGTAGTACGTGTGAGCGACGGTGACACCTCGACTCTAG 

ThrGlyHisValLysAsnGlyThrMetArglleValGlyProArgThrCysArgAsnMet 
24 02 AC T6GACATGTCAAAAACGGGACGATGAGGATCGTCGGTCCT AGGACCTGC AGGAACATG 
TGACCTGTACAGTTTTTGCCCTGCTACTCCTAGCAGCCAGGATCCTGGACGTCCTTGTAC 

A AAA 

2425 BSABl, 2441 AVR2, 2448 SSE83871, 2449 PSTI, 

TrpSerGlyThrPheProIleAsnAlaTyrThrThrGlyProCysThrProLeuProAla 
24 62 TGGAGTGGGACCTTCCCCATTAATGCCTACACCACGGGCCCCTGTACCCCCCTTCCTGCG 
ACCTCACCCTGGAAGGGGTAATTACGGATGTGGTGCCCGGGGACATGGGGGGAAGGACGC 

A A 

2480 ASEl, 2497 APAI, 

ProAsnTyrThrPheAlaLeuTrpArgValSerAlaGluGluTyrValGluIleArgGln 
2522 CCGAACTACACGTTCGCGCTATGGAGGGTGTCTGCAGAGGAATACGTGGAGAT AAGGCAG 
GGCTTGATGTGCAAGCGCGATACCTCCCACAGACGTCTCCTTATGCACCTCTATTCCGTC 

2553 PSTI, 

ValGlyAspPheHisTyrValThrGlyMetThrThrAspAsnLeuLysCysProCysGln 
2582 GTGGGGGACTTCCACTACGTGACGGGTATGACTACTGACAATCTTAAATGCCCGTGCCAG 
CACCCCCTGAAGGTGATGCACTGCCCATACTGATGACTGTTAGAATTTACGGGCACGGTC 

2594 DRA3, 

ValProSerProGluPhePheThrGluLeuAspGlyValArgLeuHisArgPheAlaPro 
2642 GTCCCATCGCCCGAATTTTTCACAGAATTGGACGGGGTGCGCCTACATAGGTTTGCGCCC 
CAGGGTAGCGGGCTTAAAAAGTGTCTTAACCTGCCCCACGCGGATGTATCCAAACGCGGG 

ProCysLysProLeuLeuArgGluGluValSerPheArgValGlyLeuHisGluTyrPro 
2702 CCCTGCAAGCCCTTGCTGCGGGAGGAGGTATCATTCAGAGTAGGACTCCACGAATACCCG 
GGGACGTTCGGGAACGACGCCCTCCTCCATAGTAAGTCTCATCCTGAGGTGCTTATGGGC 

2757 HGIE2, 

ValGlySerGlnLeuProCysGluProGluProAspValAlaValLeuThrSerMetLeu 
27 62 GTAGGGTCGCAATTACCTTGCGAGCCCGAACCGGACGTGGCCGTGTTGACGTCCATGCTC 
CATCCCAGCGTTAATGGAACGCTCGGGCTTGGCCTGCACCGGCACAACTGCAGGTACGAG 

A 

2809 AAT2, 

ThrAspProSerHisIleThrAlaGluAlaAlaGlyArgArgLeuAlaArgGlySerPro 
2822 ACTGATCCCTCCCATATAACAGCAGAGGCGGCCGGGCGAAGGTTGGCGAGGGG ATCACCC 
TGACTAGGGAGGGTATATTGTCGTCTCCGCCGGCCCGCTTCCAACCGCTCCCCTAGTGGG 

A 

2850 EAGl XMA3, 

ProSerValAlaSerSerSerAlaSerGinLeuSerAlaProSerLeuLysAlaThrCys 
2882 CCCTCTGTGGCCAGCTCCTCGGCTAGCCAGCTATCCGCTCCATCTCTCAAGGCAACTTGC 
GGGAGACACCGGTCGAGGAGCCGATCGGTCGATAGGCGAGGTAGAGAGTTCCGTTGAACG 

A A 

2889 BALI, 2903 NHEI, 
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ThrAlaAsnHisAspSerProAspAlaGIuLeuIleGluAlaAsnLeuLeuTrpAtgGin 
2942 ACCGCTAACCATGACTCCCCTGATGCTGAGCTCATAGAGGCCAACCTCCTATGGAGGCAG 
TGGCGATTGGTACTGAGGGGACTACGACTCGAGTATCTCCGGTTGGAGGATACCTCCGTC 

V 2966 ESPl, 2969 SACI, 

GluMetGlyGlyAsnlleThrArgValGluSerGluAsnLysValVallleLeuAspSer 
3002 GAGATGGGCGGCAACATCACCAGGGTTGAGTCAGAAAACAAAGTGGTGATTCTGGACTCC 
CTCTACCCGCCGTTGTAGTGGTCCCAACTCAGTCTTTTGTTTCACCACTAAGACCTGAGG 

PheAspProLeuValAlaGluGluAspGluArgGluIleSerValProAlaGluIleLeu 
3062 TTCGATCCGCTTGTGGCGGAGGAGGACGAGCGGGAGATCTCCGTACCCGCAGAAATCCTG 
AAGCTAGGCGAACACCGCCTCCTCCTGCTCGCCCTCTAGAGGCATGGGCGTCTTTAGGAC 

3096 BGL2, 

ArgLysSerArgArgPheAlaGlnAlaLeuProValTrpAlaArgProAspTyrAsnPro 
3122 CGGAAGTCTCGGAGATTCGCCCAGGCCCTGCCCGTTTGGGCGCGGCCGGACTATAACCCC 
GCCTTCAGAGCCTCTAAGCGGGTCCGGGACGGGCAAACCCGCGCCGGCCTGATATTGGGG 

3143 ALWNl, 3164 EAGl XMA3, 

ProLeuValGluThrTrpLysLysProAspTyrGluProProValValHisGlyCysPro 
3182 CCGCTAGTGGAGACGTGGAAAAAGCCCGACTACGAACCACCTGTGGTCCATGGCTGCCCG 
GGCGATCACCTCTGCACCTTTTTCGGGCTGATGCTTGGTGGACACCAGGTACCGACGGGC 

3217 HGIE2, 3229 NCOI, 

LeuProProProLysSerProProValProProProArgLysLysArgThrValValLeu 
3242 CTTCCACCTCC AAAGTCCCCTCCTGTGCCTCCGCCTCGGAAGAAGCGGACGGTGGTCCTC 
GAAGGTGGAGGTTTCAGGGGAGGACACGGAGGCGGAGCCTTCTTCGCCTGCCACCAGGAG 

ThrGluSerThrLeuSerThrAlaLeuAlaGluLeuAlaThrArgSerPheGlySerSer 
3302 ACTGAATCAACCCTATCTACTGCCTTGGCCGAGCTCGCCACCAGAAGCTTTGGCAGCTCC 
TGACTTAGTTGGGATAGATGACGGAACCGGCTCGAGCGGTGGTCTTCGAAACCGTCGAGG 

3332 SACI, 3346 HIND3, 

SerThrSerGlylleThrGlyAspAsnThrThrThrSerSerGluProAlaProSerGly 
3362 TCAACTTCCGGCATTACGGGCGACAATACGACAACATCCTCTGAGCCCGCCCCTTCTGGC 
AGTTGAAGGCCGTAATGCCCGCTGTTATGCTGTTGTAGGAGACTCGGGCGGGGAAGACCG 

CysProProAspSerAspAlaGluSerTyrSerSerMetProProLeuGluGlyGluPro 
3422 TGCCCCCCCGACTCCGACGCTGAGTCCTATTCCTCCATGCCCCCCCTGGAGGGGGAGCCT 
ACGGGGGGGCTGAGGCTGCGACTCAGGATAAGGAGGTACGGGGGGGACCTCCCCCTCGGA 

3437 EAM11051, 

GlyAspProAspLeuSerAspGlySerTrpSerThrValSerSerGluAlaAsnAlaGlu 
34 82 GGGGATCCGGATCTTAGCGACGGGTCATGGTCAACGGTCAGTAGTGAGGCCAACGCGGAG 
CCCCTAGGCCTAGAATCGCTGCCCAGTACCAGTTGCCAGTCATCACTCCGGTTGCGCCTC 

3484 BAMHI, 3485 BSABl, 3487 BSPEl, 

AspValValCysCysSerMetSerTyrSerTrpThrGlyAlaLeuValThrProCysAla 
3542 GATGTCGTGTGCTGCTCAATGTCTTACTCTTGGACAGGCGCACTCGTCACCCCGTGCGCC 
CTACAGCACACGACGAGTTACAGAATGAGAACCTGTCCGCGTGAGCAGTGGGGCACGCGG 
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3589 DRA3, 3600 SAC2, - 

AlaGluGluGlnLysLeuProIleAsnAlaLeuSerAsnSerLeuLeuArgHisHisAsn 
3602 ,.. GCGGAAGAACAGAAACTGCCCATCAATGCACTAAGCAACTCGTTGCTACGTCACCACAAT 
CGCCTTCTTGTCTTTGACGGGTAGTTACGTGATTCGTTGAGCAACGATGCAGTGGTGTTA 

3611 ALWNl, 3655 PFLMl, 

LeuValTyrSerThrThrSerArgSerAlaCysGlnArgGlnLysLysValThrPheAsp 
3662 TTGGTGTATTCCACCACCTCACGCAGTGCTTGCCAAAGGCAGAAGAAAGTCACATT7GAC 
AACCACATAAGGTGGTGGAGTGCGTCACGAACGGTTTCCGTCTTCTTTCAGTGTAAACTG 

3681 DRA3, 

ArgLeuGlnValLeuAspSerHisTyrGinAspValLeuLysGluValLysAlaAlaAla 
3722 AGACTGCAAGTTCTGGACAGCCATTACCAGGACGTACTCAAGGAGGTTAAAGCAGCGGCG 
TCTGACGTTCAAGACCTGTCGGTAATGGTCCTGCATGAGTTCCTCCAATTTCGTCGCCGC 

SerLysValLysAlaAsnLeuLeuSerValGluGluAlaCysSerLeuThrProProHis 
3782 TCAAAAGTGAAGGCTAACTTGCTATCCGTAGAGGAAGCTTGCAGCCTGACGCCCCCACAC 
AGTTTTCACTTCCGATTGAACGATAGGCATCTCCTTCGAACGTCGGACTGCGGGGGTGTG 

A 

3816 HIND3, 

SerAiaLysSerLysPheGlyTyrGlyAlaLysAspValArgCysHisAlaArgLysAla 
3842 TCAGCCAAATCCAAGTTTGGTTATGGGGCAAAAGACGTCCGTTGCCATGCCAGAAAGGCC 
AGTCGGTTTAGGTTCAAACCAATACCCCGTTTTCTGCAGGCAACGGTACGGTCTTTCCGG 

3875 AAT2, 3890 BGLI, 

ValThrHisIleAsnSerValTrpLysAspLeuLeuGluAspAsnValThrProIleAsp 
3902 GTAACCCACATCAACTCCGTGTGGAAAGACCTTCTGGAAGACAATGTAACACCAATAGAC 
CATTGGGTGTAGTTGAGGCACACCTTTCTGGAAGACCTTCTGTTACATTGTGGTTATCTG 

ThrThrlleMetAlaLysAsnGluValPheCysValGlnProGluLysGlyGlyArgLys 
3962 ACTACCATCATGGCTAAGAACGAGGTTTTCTGCGTTCAGCCTGAGAAGGGGGGTCGTAAG 
TGATGGTAGTACCGATTCTTGCTCCAAAAGACGCAAGTCGGACTCTTCeeCCCAGCATTC 

ProAlaArgLeuIleValPheProAspLeuGlyValArgValCysGluLysMetAlaLeu 
4 022 CCAGCTCGTCTCATCGTGTTCCCCGATCTGGGCGTGCGCGTGTGCGAAAAGATGGCTTTG 
GGTCGAGCAGAGTAGCACAAGGGGCTAGACCCGCACGCGCACACGCTTTTCTACCGAAAC 

TyrAspValValThrLysLeuProLeuAiaValMetGlySerSerTyrGlyPheGlnTyr 
4 082 TACGACGTGGTTACAAAGCTCCCCTTGGCCGTGATGGGAAGCTCCTACGGATTCCAATAC 
ATGCTGCACCAATGTTTCGAGGGGAACCGGCACTACCCTTCGAGGATGCCTAAG6TTATG 

SerProGlyGlnArgValGluPheLeuValGlnAlaTrpLysSerLysLysThrProMet 
4142 TCACCAGGACAGCGGGTTGAATTCCTCGTGCAAGCGTGGAAGTCCAAGAAAACCCCAATG 
AGTGGTCCTGTCGCCCAACTTAAGGAGCACGTTCGCACCTTCAGGTTCTTTTGGGGTTAC 

A 

4160 ECORI, 

GlyPheSerTyrAspThrArgCysPheAspSerThrValThrGluSerAspIleArgThr 
4202 GGGTTCTCGTATGATACCCGCTGCTTTGACTCCACAGTCACTGAGAGCGACATCCGTACG 
CCCAAGAGCATACTATGGGCGACGAAACTGAGGTGTCAGTGACTCTCGCTGTAGGCATGC 
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4229 DRDl, 4236 ALWNl, 

GluGluAlalleTyrGlnCysCysAspLeuAspProGinAlaArgValAlalleLysSer 
4262 GA6GAGGCAATCTACCAATGTTGTGACCTCGACCCCCAAGCCCGCGTGGCCATCAAGTCC 
• CTCCTCCGTTAGATGGTTACAACACTGGAGCTGGGGGTTCGGGCGCACCGGTAGTTCAGG 

4301 BGLI, 4308 BALI, 

LeuThrGluArgLeuTyrValGlyGlyProLeuThrAsnSerArgGlyGluAsnCysGly 
4322 CTCACCGAGAGGCTTTATGTTGGGGGCCCTCTTACCAATTCAAGGGGGGAGAACTGCGGC 
GAGTGGCTCTCCGAAATACAACCCCCGGGAGAATGGTTAAGTTCCCCCCTCTTGACGCCG 

4 34 5 APAI, 

TyrArgArgCysArgAlaSerGlyValLeuThrThrSerCysGlyAsnThrLeuThrCys 
4382 TATCGCAGG JGCCGCGCGAGCGGCGTACTGACAACTAGCTGTGGTAACACCCTCACTTGC 
ATAGGGTCCACGGCGCGCTCGCCGCATGACTGTTGATCGACACCATTGTGGGAGTGAACG 

TyrlleLysAlaArgAlaAlaCysArgAlaAlaGlyLeuGlnAspCysThrMetLeuVal 
4 4 4 2 TAPATCAAGGCCCGGGCAGCCTGTCGAGCCGCAGGGCTCCAGGACTGCACCATGCTCGTG 
ATGTAGTTCCGGGCCCGTCGGACAGCTCGGCGTCCCGAGGTCCTGACGTGGTAGGAGCAC 

4 4 52 SMAI XMAI, 

CysGlyAspAspLeuValVallleCysGluSerAlaGlyValGlnGluAspAlaAlaSer 
4 502 TGTGGCGACGACTTAGTCGTTATCTGTGAAAGCGCGGGGGTCCAGGAGGACGCGGCGAGC 
ACACCGCTGCTGAATCAGCAATAGACACTTTCGCGCCCCCAGGTCCTCCTGCGCCGCTCG 

4508 DRDl, 4511 TTH3I, 

LeuArgAlaPheThrGluAlaMetThrArgTyrSerAlaProProGlyAspProProGln 
4 562 CTGAGAGCCTTCACGGAGGCTATGACCAGGTACTCCGCCCCCCCTGGGGACCCCCCACAA 
GACTCTCGGAAGTGCCTCCGATACTGGTCCATGAGGCGGGGGGGACCCCTGGGGGGTGTT 

ProGluTyrAspLeuGluLeuIleThrSerCysSerSerAsnValSerValAlaHisAsp 
4 622 CCAGAATACGACTTGGAGCTCATAACATCATGCTCCTCCAACGTGTCAGTCGCCCACGAC 
GGTCTTATGCTGAACCTCGAGTATTGTAGTACGAGGAGGTTGCACAGTCAGCGGGTGCTG 

4 637 SACI, 

GlyAlaGlyLysArgValTyrTyrLeuThrArgAspProThrThrProLeuAlaArgAla 
4 682 GGCGCTGGAAAGAGGGTCTACTACCTCACCCGTGACCCTACAACCCCCCTCGCGAGAGCT 
CCGCGACCTTTCTCCCAGATGATGGAGTGGGCACTGGGATGTTGGGGGGAGCGCTCTCGA 

4 731 NRUI, 

AlaTrpGluThrAlaArgHisThrProValAsnSerTrpLeuGlyAsnllelleMetPhe 
4 74 2 GCGTGGGAGACAGCAAGACACACTCCAGTCAATTCCTGGCTAGGCAACATAATCATGTTT 
CGCACCCTCTGTCGTTCTGTGTGAGGTCAGTTAAGGACCGATCCGTTGTATTAGTACAAA 

AlaProThrLeuTrpAlaArgMetlleLeuMetThrHisPhePheSerValLeuIleAla 
4 802 GCCCCCACACTGTGGGCGAGGATGATACTGATGACCCATTTCTTTAGCGTCCTTATAGCC 
CGGGGGTGTGACACCCGCTCCTACTATGACTACTGGGTAAAGAAATCGCAGGAATATCGG 

4806 PFLMl, 4807 DRA3, 

ArgAspGlnLeuGluGlnAlaL uAspCysGluIleTyrGlyAlaCysTyrSerlleGlu 
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4 8 62 AGGGACCAGCTTGAACAGGCCCTCGATTGCGAGATCTACGGGGCCTGCTACf CCATAGAA 
TCCCTGGTCGAACTTGTCCGGGAGCTAACGCTCTAGATGCCCCGGACGATGAGGTATCTT 

4893 BGL2, 

PrqLeuAspLeuProProIlelleGlnArgLeuHisGlyLeuSerAlaPheSerLeuHis 
4 9,2 2 CCACTGGATCTACCTCCAATCATTCAAAGACTCCATGGCCTCAGCGCATTTTCACTCCAC 
GGTGACCTAGATGGAGGTTAGTAAGTTTCTGAGGTACCGGAGTCGCGTAAAAGTGAGGTG 

4 954 NCOI, 

SerTyrSerProGlyGluIleAsnArgValAlaAlaCysLeuArgLysLeuGlyValPro 
4 982 AGTTACTCTCCAGGTGAAATCAATAGGGTGGCCGCATGCCTCAGAAAACTTGGGGTACCG 
TCAATGAGAGGTCCACTTTAGTTATCCCACCGGCGTACGGAGTCTTTTGAACCCCATGGC 

5015 SPHI, 5035 KPNI, 

ProLeuArigAlaTrpArgHisArgAlaArgSerValArgAlaArgLeuLeuAlaArgGly 
5042 CCCTTGCGAGCTTGGAGACACCGGGCCCGGAGCGTCCGCGCTAGGCTTCTGGCCAGAGGA 
GGGAACGCTCGAACCTCTGTGGCCCGGGCCTCGCAGGCGCGATCCGAAGACCGGTCTCCT 

5064 APAI, 5091 BALI, 

GlyArgAlaAlalleCysGlyLysTyrLeuPheAsnTrpAlaValArgThrLysLeuLys 
5102 GGCAGGGCTGCCATATGTGGCAAGTACCTCTTCAACTGGGCAGTAAGAACAAAGCTCAAA 
CCGTCCCGACGGTATACACCGTTCATGGAGAAGTTGACCCGTCATTCTTGTTTCGAGTTT 

5113 NDEI, 

LeuThrProIleAlaAlaAlaGlyGlnLeuAspLeuSerGlyTrpPheThrAlaGlyTyr 
5162 CTCACTCCAATAGCGGCCGCTGGCCAGCTGGACTTGTCCGGCTGGTTCACGGCTGGCTAC 
GAGTGAGGTTATCGCCGGCGACCGGTCGACCTGAACAGGCCGACCAAGTGCCGACCGATG 

5174 NOTI, 5175 EAGl XMA3, 5182 BALI, 5186 PVU2, 

SerGlyGlyAspIleTyrHisSerValSerHisAlaArgProArgTrpIleTrpPheCys 
5222 AGCGGGGGAGACATTTATCACAGCGTGTCTCATGCCCGGCCCCGCTGGATCTGGTTTTGC 
TCGCCCCCTCTGTAAATAGTGTCGCACAGAGTACGGGCCGGGGCGACCTAGACCAAAACG 

5240 DRA3, 

LeuLeuLeuLeuAlaAlaGlyValGlylleTyrLeuLeuProAsnArgMetSerThrAsn 

5282 CTACTCCTGCTTGCTGCAGGGGTAGGCATCTACCTCCTCCCCAACCGAATGAGCACGAAT 

GATGAGGACGAACGACGTCCCCATCCGTAGATGGAGGAGGGGTTGGCTTACTCGTGCTTA 
A. 

5295 PSTI, 

ProLysProGlnArgLysThrLysArgAsnThrAsnArgArgProGlnAspValLysPhe 
534 2 CCTAAACCTCAAAGAAAGACCAAACGTAACACCAACCGGCGGCCGCAGGACGTCAAGTTC 
GGATTTGGAGTTTCTTTCTGGTTTGCATTGTGGTTGGCCGCCGGCGTCCTGCAGTTCAAG 

5380 NOTI, 5381 EAGl XMA3, 5390 AAT2, 5401 SMAI XMAI, 

ProGlyGlyGlyGlnlleValGlyGlyValTyrLeuLeuProArgArgGlyProArgLeu 
5402 CCGGGTGGCGGTCAGATCGTTGGTGGAGTTTACTTGTTGCCGCGCAGGGGCCCTAGATTG 
GGCCCACCGCCAGTCTAGCAACCACCTCAAATGAACAACGGCGCGTCCCCGGGATCTAAC 
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5449 APAI, 

GlyValArgAlaThrArgLysThrSerGlxiArgSerGlnProArgGlyArgArgGlnPro 
5 4 62 GGTGTGCGCGCGACGAGAAAGACTTCCGAGCGGTCGCAACCTCGAGGTAGACGTCAGCCT 
• ...CCACACGCGCGCTGCTCTTTCTGAAGGCTCGCCAGCGTTGGAGCTCCATCTGCAGTCGGA 

54 67 BSSH2, 5478 XMNI, 5502 XHOI, 5511 AAT2, 

IleProLysAlaArgArgProGluGlyArgThrTrpAlaGlnProGlyTyrProTrpPro 
5522 ATCCCCAAGGCTCGTCGGCCCGAGGGCAGGACCTGGGCTCAGCCCGGGTACCCTTGGCCC 
TAGGGGTTCCGAGCAGCCGGGCTCCCGTCCTGGACCCGAGTCGGGCCCATGGGAACCGGG 

5548 ALWNl, 5558 ESPl, 5564 SMAI XMAI, 5568 KPNI, 

LeuTyrGlyAsnGluGlyCysGlyTrpAlaGlyTrpLeuLeuSerProArgGlySerArg 
5582 CTCTATGGCAATGAGGGCTGCGGGTGGGCGGGATGGCTCCTGTCTCCCCGTGGCTCTCGG 
GAGATACrCGTTACTCCCGACGCCCACCCGCCCTACCGAGGACAGAGGGGCACCGAGAGCC 

ProSerTrpGlyProThrAspProArgArgArgSerArgAsnLeuGlyLysVallleAsp 
5642 CCTAGCTGGGGCCCCACAGACCCCCGGCGTAGGTCGCGCAATTTGGGTAAGGTCATCGAT 
GGATCGACCCCGGGGTGTCTGGGGGCCGCATCCAGCGCGTTAAACCCATTCCAGTAGCTA 

5650 APAI, 5696 CLAI, 

ThrLeuThrCysGlyPheAlaAspLeuMetGlyTyrlleProLeuValGlyAlaProLeu 
5702 ACCCTTACGTGCGGCTTCGCCGACCTCATGGGGTACATACCGCTCGTCGGCGCCCCTCTT 
TGGGAATGCACGCCGAAGCGGCTGGAGTACCCCATGTATGGCGAGCAGCCGCGGGGAGAA 

5724 HGIE2, 5750 KASl NARI, 5756 ECONl, 

GlyGlyAlaAlaArgAlaOC AM 
5762 GGAGGCGCTGCCAGGGCCTAAT AGTCGAC 
CCTCCGCGACGGTCCCGGATTATCAGCTG 

5785 SALI, 
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SEQUENCE LISTING 

<110> CHIRON CORPORATION et al . 

<120> NOVEL HCV NON- STRUCTURAL POLYPEPTIDE 

<130> PP01617.003 

<140> 
<141> 

<160> 19 

<170> Patentin Ver. 2,0 

<210> 1 

<211> 9620 

<212> DNA 

<213> Artificial Sequence 

<220> 

<221> CDS 

<222> (1990) . . (7302) 
<220> 

<223> Description of Artificial Sequence: Hepatitis C pns345 



<400> 1 
cgcgcgtttc 


ggtgatgacg 


gtgaaaacct 


ctgacacatg 


cagctcccgg 


agacggtcac 


60 


agcttgtctg 


taagcggatg 


ccgggagcag 


acaagcccgt 


cagggcgcgt 


cagcgggtgt 


120 


tggcgggtgt 


cggggctggc 


ttaactatgc 


ggcatcagag 


cagattgtac 


tgagagtgca 


180 


ccatatgaag 


ctttttgcaa 


aagcctaggc 


ctccaaaaaa 


gcctcctcac 


tacttctgga 


240 


atagctcaga 


ggccgaggcg 


gcctcggcct 


ctgcataaat 


aaaaaaaatt 


agtcagccat 


300 


ggggcggaga 


atgggcggaa 


ctgggcgggg 


agggaattat 


tggctattgg 


ccattgcata 


360 


cgttgtatct 


atatcataat 


atgtacattt 


atattggctc 


atgtccaatai 


tgaccgccat 


420 


gttgacattg 


attattgact 


agttattaat 


agtaatcaat 


tacggggtca 


ttagttcata 


480 


gcccatatat 


ggagttccgc 


gttacataac 


ttacggtaaa 


tggcccgcct 


ggctgaccgc 


540 


ccaacgaccc 


ccgcccattg 


acgtcaataa 


tgacgtatgt 


tcccatagta 


acgccaatag 


600 


ggactttcca 


ttgacgtcaa 


tgggtggagt 


atttacggta 


aactgcccac 


ttggcagtac 


660 


atcaagtgta 


tcatatgcca 


agtccgcccc 


ctattgacgt 


caatgacggt 


aaatggcccg 


720 


cctggcatta 


tgcccagtac 


atgaccttac 


gggactttcc 


tacttggcag 


tacatctacg 


780 


tattagtcat 


cgctattacc 


atggtgatgc 


ggttttggca 


gtacaccaat 


gggcgtggat 


840 


agcggtttga 


ctcacgggga 


tttccaagtc 


tccaccccat 


tgacgtcaat 


gggagtttgt 


900 
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aaaccaacgg 


gaccuuccaa 


aatgtcgtaa 


taaccccgcc 


ccgttgacgc 


960 


aaacgggcgg 


taggcgtgta 


cggtgggagg 


cctatataag 


cagagctcgt 


ttagtgaacc 


1020 


gccagaccgc 


ctggagacgc 


catccacgct: 


gttttgacct 


ccatagaaga 


caccgggacc 


1080 


gatccagcct 


ccgcggccgg 


gaacggtgca 


ttggaacgcg 


gattccccgt 


gccaagagtg 


1140 


acgtaagcac 


cgcctataga 


ctctataggc 


acaccccttt 


ggctcttatg 


catgctatac 


1200 


cgtctttggc 


ttggggccta 


tacacccccg 


ctccttatgc 


tataggtgat 


ggtatagctt 


1260 


agcctatagg 


tgtgggttat 


tgaccattat 


tgaccactcc 


cctattggtg 


acgatacttt 


1320 


ccattactaa 


tccataacat 


ggctctttgc 


cacaactatc 


tctattggct 


atatgccaat 


1380 


actctgtcct 


tcagagactg 


acacggactc 


tgtattttta 


caggatgggg 


tccatttatt 


1440 


atttacaaat 


tcacatatac 


aacaacgccg 


tcccccgtgc 


ccgcagtttt 


tattaaacat 


1500 


agcgtgggat 


ctccgacatc 


tcgggtacgt 


gttccggaca 


tgggctcttc 


tccggtagcg 


1560 


gcggagcttc 


cacatccgag 


ccctggtccc 


atccgtccag 


cggctcatgg 


tcgctcggca 


1620 


gctccttgct 


cctaacagtg 


gaggccagac 


ttaggcacag 


cacaatgccc 


accaccacca 


1680 


gtgtgccgca 


caaggccgtg 


gcggtagggt 


atgtgtctga 


aaatgagctc 


ggagattggg 


1740 


ctcgcacctg 


gacgcagatg 


gaagacttaa 


ggcagcggca 


gaagaagatg 


caggcagctg 


1800 


agttgttgta 


ttctgataag 


agtcagaggt 


aactcccgtt 


gcggtgctgt 


taacggtgga 


1860 


gggcagtgta 


gtctgagcag 


tactcgttgc 


tgccgcgcgc 


gccaccagac 


ataatagctg 


1920 


acagactaac 


agactgttcc 


tttccatggg 


tcttttctgc 


agtcaccgtc 


gtcgacctaa 


1980 



gaattcacc atg get gca tat gca get cag ggc tat aag gtg eta gta etc 2031 
Met Ala Ala Tyx Ala Ala Gin Gly Tyr Lys Val Leu Val Leu 
15 10 

aae ccc tct gtt get gca aca etg ggc ttt ggt get tac atg tec aag 2079 
Asn Pro Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr Met Ser Lys 
15 20 25 30 

get eat ggg ate gat cct aae ate agg acc ggg gtg aga aea att acc 2127 
Ala His Gly lie Asp Pro Asn lie Arg Thr Gly Val TVrg Thr lie Thr 
35 40 45 

act ggc age ccc ate aeg tac tee ace tac ggc aag tte ctt gee gac 2175 
Thr Gly Ser Pro lie Thr Tyr Ser Thr Tyr Gly Lys Phe Leu Ala Asp 
50 55 60 

ggc ggg tgc teg ggg ggc get tat gac ata ata att tgt gae gag tgc 2223 
Gly Gly Cys Ser Gly Gly Ala Tyr Asp lie lie He Cys Asp Glu Cys 
65 70 75 

cae tee aeg gat gee aca tec ate ttg ggc att ggc act gte ctt gac 2271 
His Ser Thr Asp Ala Thr Ser He Leu Gly He Gly Thr Val Leu Asp 
80 85 90 
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caa gca gag act gcg ggg gcg aga ctg gtt gtg etc gcc acc gcc acc 2319 
Gin Ala Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala Thr Ala Thr 
95 100 105 110 

cct ccg ggc tec gtc act gtg ccc cat ccc aac ate gag gag gtt get 2367 
Pro Pro Gly Ser Val Thr Val Pro His Pro Asn He Glu Glu Val Ala 
115 120 125 

Ctg tec acc acc gga gag ate cct ttt tac ggc aag get ate ccc etc 2415 
Leu Ser Thr Thr Gly Glu He Pro Phe Tyr Gly Lys Ala He Pro Leu 
130 135 140 

gaa gta ate aag ggg ggg aga cat etc ate ttc tgt eat tea aag aag 2463 
Glu Val He Lys Gly Gly Arg His Leu He Phe Cys His Ser Lys Lys 
145 150 155 

aag tgc gac gaa etc gee gca aag ctg gtc gca ttg ggc ate aat gcc 2511 
Lys Cys Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly He Asn Ala 
160 165 170 

gtg gee tac tac cgc ggt ett gac gtg tec gtc ate ccg acc age ggc 2559 
Val Ala Tyr Tyr Arg Gly Leu Asp Val Ser Val He Pro Thr Ser Gly 
175 180 185 190 

gat gtt gtc gtc gtg gca ace gat gee etc atg acc ggc tat acc ggc 2607 
Asp Val Val Val Val Ala Thr Asp Ala Leu Met Thr Gly Tyr Thr Gly 
195 200 205 

gac ttc gac teg gtg ata gac tgc aat acg tgt gtc ace eag aca gte 2655 
Asp Phe Asp Ser Val He Asp Cys Asn Thr Cys Val Thr Gin Thr Val 
210 215 220 

gat ttc age ett gac cct ace ttc ace att gag aca ate acg etc cee 2703 
Asp Phe Ser Leu Asp Pro Thr Phe Thr He Glu Thr He Thr Leu Pro 
225 230 235 

caa gat get gtc tec ege act caa cgt egg ggc agg act ggc agg ggg 2751 
Gin Asp Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr Gly Arg Gly 
240 245 250 

aag cea ggc ate tac aga ttt gtg gca ccg ggg gag ege ccc tec ggc 2799 
vLys Pro Gly He Tyr Arg Phe Val Ala Pro Gly Glu Arg Pro Ser Gly 
255 260 265 270 

atg ttc gac teg tec gte etc tgt gag tgc tat gac gca ggc tgt get 2847 
Met Phe Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala Gly Cys Ala 
275 280 285 

tgg tat gag etc acg ccc gee gag act aca gtt agg eta cga gcg tac 2895 
Trp Tyr Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu Arg Ala Tyr 
290 295 300 

atg aac ace eeg ggg ett ccc gtg tgc cag gae eat ett gaa ttt tgg 2943 
Met Asn Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu Glu Phe Tirp 
305 310 315 

gag ggc gtc ttt aca ggc etc act cat ata gat gcc cac ttt eta tec 2 991 
Glu Gly Val Phe Thr Gly Leu Thr His He Asp Ala His Phe Leu Ser 
320 325 330 
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cag aca aag cag agt ggg gag aac ctt cct tac ctg gta gcg tac caa 3039 
Gin Thr Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val Ala Tyr Gin 
335 340 345 350 

gcc acc gtg tgc get agg get caa gcc cct ccc oca teg tgg gae cag 3087 
Ala Thr Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser Trp Asp Gin 
355 360 365 

^tg tgg aag tgt ttg att cgc etc aag eec acc etc cat ggg cca aca 3135 
Met Trp Lys Cys Leu lie Arg Leu Lys Pro Thr Leu His Gly Pro Thr 
370 375 380 

ccc ctg eta tac aga ctg ggc get gtt cag aat gaa ate acc ctg acg 3183 
Pro Leu Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu lie Thr Leu Thr 
385 390 395 

cac cca gtc acc aaa tac ate atg aca tgc atg teg gee gac ctg gag 3231 
His Pro Val Thr Lys Tyr He Met Thr Cys Met Ser Ala Asp Leu Glu 
400 405 410 

gtc gtc acg age acc tgg gtg etc gtt ggc ggc gtc ctg get get ttg 3279 
Val Val Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu Ala Ala Leu 

415 420 425 430 

gcc gcg tat tgc ctg tea aca ggc tgc gtg gtc ata gtg ggc agg gtc 3 327 
Ala Ala Tyr Cys Leu Ser Thr Gly Cys Val Val He Val Gly Arg Val 
435 440 445 

gtc ttg tee ggg aag ecg gea ate ata cct gac agg gaa gtc etc tac 3 375 
Val Leu Ser Gly Lys Pro Ala He He Pro Asp Arg Glu Val Leu Tyr 
450 455 460 

ega gag ttc gat gag atg gaa gag tgc tet cag cac tta ecg tac ate 3423 
Arg Glu Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu Pro Tyr He 
465 470 475 

gag caa ggg atg atg etc gee gag cag ttc aag cag aag gee etc ggc 3471 
Glu Gin Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys Ala Leu Gly 
480 485 490 

etc ctg cag acc gcg tec cgt cag gea gag gtt ate gcc cct get gtc 3519 
Leu Leu Gin Thr Ala Ser Arg Gin Ala Glu Val He Ala Pro Ala Val 
495 500 505 510 

cag ace aac tgg caa aaa etc gag ace ttc tgg gcg aag cat atg tgg 3567 
Gin Thr Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys His Met Trp 
515 520 525 • 

aac ttc ate agt ggg ata caa tac ttg gcg ggc ttg tea acg ctg cct 3615 
Asn Phe He Ser Gly He Gin Tyr Leu Ala Gly Leu Ser Thr Leu Pro 
530 535 540 

ggt aac ccc gcc att get tea ttg atg get ttt aca get get gtc acc 3663 
Gly Asn Pro Ala He Ala Ser Leu Met Ala Phe Thr Ala Ala Val Thr 
545 550 555 

age cca eta acc act age caa acc etc etc ttc aac ata ttg ggg ggg 3711 
Ser Pro Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn He Leu Gly Gly 
560 565 570 
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tgg gtg get gcc cag etc gee gee eec ggt gee get act gcc ttt gtg 3759 
Trp Val Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr Ala Phe Val 
575 580 585 590 

ggc get ggc tta get ggc gee gcc ate ggc agt gtt gga ctg ggg aag 3807 
Gly Ala Gly Leu Ala Gly Ala Ala He Gly Ser Val Gly Leu Gly Lys 
595 600 605 

gtc etc ata gae ate ctt gea ggg tat ggc geg ggc gtg gcg gga get 3855 
Val Leu lie Asp He Leu Ala Gly Tyr Gly Ala Gly Val Ala Gly Ala 
610 615 620 

ctt gtg gca ttc aag ate atg age ggt gag gtc eec tee acg gag gae 3903 
Leu Val Ala Phe Lys He Met Ser Gly Glu Val Pro Ser Thr Glu Asp 
625 630 635 

ctg gtc aat eta ctg ccc gcc ate etc teg eec gga gee etc gta gtc 3951 
Leu Val Asn Leu Leu Pro Ala He Leu Ser Pro Gly Ala Leu Val Val 
640 645 650 

ggc gtg gtc tgt gca gca ata ctg cge egg cae gtt ggc ccg ggc gag 3999 
Gly Val Val Cys Ala Ala He Leu Arg Arg His Val Gly Pro Gly Glu 
655 660 665 670 

ggg gca gtg cag tgg atg aac egg ctg ata gcc ttc gee tec egg ggg 404 7 
Gly Ala Val Gin Trp Met Asn Arg Leu He Ala Phe Ala Ser Arg Gly 
675 680 685 

aac cat gtt tec eec acg cae tac gtg ccg gag age gat gca get gcc 4095 
Asn His Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp Ala Ala Ala 
690 695 700 

cge gtc act gcc ata etc age age etc act gta ace cag etc ctg agg 4143 
Arg Val Thr Ala He Leu Ser Ser Leu Thr Val Thr Gin Leu Leu Arg 
705 710 715 

ega ctg cae eag tgg ata age teg gag tgt acc act eca tgc tec ggt 4191 
Arg Leu His Gin Trp He Ser Ser Glu Cys Thr Thr Pro Cys Ser Gly 
720 725 730 

tec tgg eta agg gae ate tgg gae tgg ata tgc gag gtg ttg age gae 423 9 
Ser Trp Leu; i^rg Asp He Trp Asp Trp He Cys Glu Val Leu Ser Asp 
735 740 745 750 

ttt aag acc tgg eta aaa get aag etc atg eca cag ctg cet ggg ate 4287 
Phe Lys Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu Pro Gly He 
755 760 765 

ccc ttt gtg tee tgc cag cge ggg tat aag ggg gtc tgg ega ggg gae 4335 
Pro Phe Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp Arg Gly Asp 
770 775 780 

ggc ate atg cae act cge tgc cae tgt gga get gag ate act gga eat 4383 
Gly He Met His Thr Arg Cys His Cys Gly Ala Glu He Thr Gly His 
785 790 795 

gtc aaa aac ggg acg atg agg ate gtc ggt cet agg acc tgc agg aac 4431 
Val Lys Asn Gly Thr Met Arg He Val Gly Pro Arg Thr Cys Arg Asn 
800 805 810 
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atg tgg agt ggg acc ttc ccc att aat gcc tac acc acg ggc ccc tgt 4479 

Met Trp Ser Gly Thr Phe Pro lie Asn Ala Tyr Thr Thr Gly Pro Cys 

815 820 825 830 

acc ccc ctt cct gcg ccg aac tac acg ttc gcg eta tgg agg gtg tct 4527 

Thr Pro Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp Arg Val Ser 

835 840 845 

gca gag gaa tac gtg gag at a agg cag gtg ggg gac ttc cac tac gtg 4575 

Ala Glu Glu Tyr Val Glu lie Arg Gin Val Gly Asp Phe His Tyr Val 

850 855 860 

acg ggt atg act act gac aat ctt aaa tgc ccg tgc cag gtc cca teg 4623 

Thr Gly Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin Val Pro Ser 

865 870 875 

ccc gaa ttt ttc aca gaa ttg gac ggg gtg cgc eta cat agg ttt gcg 4671 

Pro Glu Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His Arg Phe Ala 

880 885 890 

ccc ccc tgc aag ccc ttg ctg egg gag gag gta tea ttc aga gta gga 4719 

Pro Pro Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe Arg Val Gly 

895 900 905 910 

etc cac gaa tac ccg gta ggg teg caa tta cct tgc gag ccc gaa ccg 4767 

Leu His Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu Pro Glu Pro 

915 920 925 

gac gtg gee gtg ttg acg tec atg etc act gat ccc tec eat ata aca 4 815 

Asp Val Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser His lie Thr 

930 935 940 

gca gag gcg gee ggg ega agg ttg gcg agg gga tea ccc ccc tct gtg 4 863 

Ala Glu Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro Pro Ser Val 

945 950 955 

gee age tec teg get age cag eta tee get cca tct etc aag gca act 4911 

Ala Ser Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu Lys Ala Thr 

960 965 970 

tgc acc get aac cat gac tec cct gat get gag etc ata gag gee aac 4959 

Cys Thr Ala Asn His Asp Ser Pro Asp Ala Glu: Leu lie Glu Ala Asn 

975 980 985 990 

etc eta tgg agg cag gag atg ggc ggc aac ate acc agg gtt gag tea 5007 

Leu Leu Trp Arg Gin Glu Met Gly Gly Asn lie Thr Arg Val Glu Ser 

995 1000 1005 

gaa aac aaa gtg gtg att ctg gac tec ttc gat ccg ctt gtg gcg gag 5055 

Glu Asn Lys Val Val He Leu Asp Ser Phe Asp Pro Leu Val Ala Glu 

1010 1015 1020 

gag gac gag egg gag ate tec gta ccc gca gaa ate ctg egg aag tct 5103 

Glu Asp Glu Arg Glu He Ser Val Pro Ala Glu He Leu Arg Lys Ser 

1025 1030 1035 

egg aga ttc gee cag gcc ctg ccc gtt tgg gcg egg ccg gac tat aac 5151 

Arg Arg Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro Asp Tyr Asn 

1040 1045 1050 



6 



wo 01/38360 



PCT/USOO/32326 



ccc ccg eta gtg gag acg tgg aaa aag ccc gac tac gaa cca cct gtg 5199 
Pro Pro Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu Pro Pro Val 
1055 1060 1065 1070 

gtc cat ggc tgc ccg ctt cca cct cca aag tec cct cct gtg cct ccg 5247 

Val His Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro Val Pro Pro 
1075 1080 1085 

0 

cct egg aag aag egg acg gtg gtc etc act gaa tea acc eta tct act 5295 

Pro Arg Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr Leu Ser Thr 
1090 1095 1100 

gee ttg gcc gag etc gee acc aga age ttt ggc age tec tea act tec 5343 
Ala Leu Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser Ser Thr Ser 
1105 1110 1115 

ggc att acg ggc gac aat acg aca aca tec tct gag ccc gee cct tct 5391 
Gly lie Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro Ala Pro Ser 
1120 1125 1130 

ggc tgc ccc ccc gac tec gac get gag tec tat tec tec atg ccc ccc 5439 
Gly Cys Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser Met Pro Pro 
1135 1140 1145 1150 

ctg gag ggg gag cct ggg gat ccg gat ctt age gac ggg tea tgg tea 5487 
Leu Glu Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly Ser Trp Ser 
1155 1160 1165 

acg gtc agt agt gag gcc aac gcg gag gat gtc gtg tgc tgc tea atg 5535 
Thr Val Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys Cys Ser Met 
1170 1175 1180 

tct tac tct tgg aca ggc gea etc gtc ace ccg tgc gcc gcg gaa gaa 5583 
Ser Tyr Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala Ala Glu Glu 
1185 1190 1195 

cag aaa ctg ccc ate aat gea eta age aac teg ttg eta cgt cae eac 5631 
Gin Lys Leu Pro lie Asn Ala Leu Ser Asn Ser Leu Leu Arg His His 
1200 1205 1210 

aat ttg gtg tat tec ace ace tea cge agt get tgc caa agg cag aag 5679 
Asn Leu Val Tyr Ser Thr Thr Ser Arg .Ser Ala Cys Gin Arg Gin Lys 
1215 1220 1225 1230 

aaa gtc aca ttt gac aga ctg caa gtt ctg gac age cat tac cag gac 5727 
Lys Val Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His Tyr Gin Asp 
1235 1240 1245 

gta etc aag gag gtt aaa gea gcg gcg tea aaa gtg aag get aac ttg 5775 
Val Leu Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys Ala Asn Leu 
1250 1255 1260 

eta tec gta gag gaa get tgc age ctg acg ccc cca cae tea gcc aaa 5823 
Leu Ser Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His Ser Ala Lys 
1265 1270 1275 

tec aag ttt ggt tat ggg gea aaa gac gtc cgt tgc cat gcc aga aag 5871 
Ser Lys Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His Ala Arg Lys 
1280 1285 1290 
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gcc gta acc cac ate aac tec gtg tgg aaa gac ctt ctg gaa gac aat 5919 
Ala Val Thr His lie Asn Ser Val Trp Lys Asp Leu Leu Glu Asp Asn 
1295 1300 1305 1310 

gta aca eca at a gac act acc ate atg get aag aac gag gtt ttc tgc 5967 
Val Thr Pro lie Asp Thr Thr He Met Ala Lys Asn Glu Val Phe Cys 
1315 1320 1325 

gtt cag cet gag aag ggg ggt cgt aag cea get egt etc ate gtg ttc 6015 
Val Gin Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu lie Val Phe 
1330 1335 1340 

cec gat ctg gge gtg egc gtg tgc gaa aag atg get ttg tac gac gtg 6063 
Pro Asp Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu Tyr Asp Val 
1345 1350 1355 

gtt aea aag etc cec ttg gee gtg atg gga age tec tac gga ttc caa 6111 
Val Thr Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr Gly Phe Gin 
1360 1365 1370 

tac tea eca gga eag egg gtt gaa tte etc gtg eaa geg tgg aag tec 6159 
Tyr Ser Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala Trp Lys Ser 
1375 1380 1385 1390 

aag aaa aee cea atg ggg tte teg tat gat ace ege tgc ttt gae tec 6207 
Lys Lys Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys Phe Asp Ser 
1395 1400 1405 

aea gte act gag age gae ate cgt acg gag gag gca ate tac eaa tgt 6255 
Thr Val Thr Glu Ser Asp He Arg Thr Glu Glu Ala He Tyr Gin Cys 
1410 1415 1420 

tgt gae etc gae eee eaa gee ege gtg gee ate aag tee etc aee gag 6303 
Cys Asp Leu Asp Pro Gin Ala Arg Val Ala He Lys Ser Leu Thr Glu 
1425 1430 1435 

agg ctt tat gtt ggg gge ect ctt aee aat tea agg ggg gag aac tgc 6351 
Arg Leu Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly Glu Asn Cys 
1440 1445 1450 

gge tat egc agg tgc egc geg age gge gta ctg aca act age tgt ggt 6399 
Gly Tyr Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr Ser Cys Gly 
1455 1460 1465 1470 

aac ace etc act tgc tae ate aag gee egg gca gcc tgt cga gee gca 6447 
Asn Thr Leu Thr Cys Tyr He Lys Ala Arg Ala Ala Cys Arg Ala Ala 
1475 1480 1485 

ggg etc eag gae tgc aee atg etc gtg tgt gge gae gae tta gte gtt 6495 
Gly Leu Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp Leu Val Val 
1490 1495 1500 

ate tgt gaa age geg ggg gte eag gag gae geg geg age ctg aga gee 6543 
He Cys Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser Leu Arg Ala 
1505 1510 1515 

ttc acg gag get atg acc agg tac tec gcc ecc cet ggg gae cec cea 6591 
Phe Thr Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly Asp Pro Pro 
1520 1525 1530 
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caa cca gaa tac gac ttg gag etc ata aca tea tgc tec tec aac gtg 6639 
Gin Pro Glu Tyr Asp Leu Glu Leu lie Thr Ser Cys Ser Ser Asn Val 
1535 1540 1545 1550 

tea gtc gee eac gac ggc get gga aag agg gtc tae tac etc acc egt 6687 
Ser Val Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr Leu Thr Arg 
1555 1560 1565 

gac ect aca acc ecc etc geg aga get gcg tgg gag aca gea aga eac 6735 
Asp Pro Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr Ala Arg His 
1570 1575 1580 

act cca gtc aat tec tgg eta ggc aae ata ate atg ttt gee cce aca 6783 
Thr Pro Val Asn Ser Trp Leu Gly Asn lie lie Met Phe Ala Pro Thr 
1585 1590 1595 

ctg tgg gcg agg atg ata ctg atg acc cat tte ttt age gtc ett ata 6831 
Leu Trp Ala Arg Met He Leu Met Thr His Phe Phe Ser Val Leu He 
1600 1605 1610 

gcc agg gac cag ett gaa cag gee etc gat tgc gag ate tae ggg gee 6879 
Ala Arg Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu He Tyr Gly Ala 
1615 1620 1625 1630 

tge tac tec ata gaa cca ctg gat eta cct cca ate att caa aga etc 6927 
Cys Tyr Ser He Glu Pro Leu Asp Leu Pro Pro He He Gin Arg Leu 
1635 1640 1645 

cat ggc etc age gea ttt tea etc eac agt tac tct cca ggt gaa ate 6975 
His Gly Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro Gly Glu He 
1650 1655 1660 

aat agg gtg gcc gea tge etc aga aaa ett ggg gta ccg cce ttg cga 7023 
Asn Arg Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro Pro Leu Arg 
1665 1670 1675 

get tgg aga cac egg gcc egg age gtc cgc get agg ett ctg gcc aga 7071 
Ala Trp Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu Leu Ala Arg 
1680 1685 1690 

gga ggc agg get gcc ata tgt ggc aag tae etc ttc aae tgg gea gta 7119 
Gly Gly Arg Ala Ala He Cys Gly. Lys Tyr Leu Phe Asn Trp Ala Val 
1695 1700 1705 1710 

aga aca aag etc aaa etc act cca ata gcg gcc get ggc eag ctg gac 7167 
Arg Thr Lys Leu Lys Leu Thr Pro He Ala Ala Ala Gly Gin Leu Asp 
1715 1720 1725 

ttg tee ggc tgg ttc aeg get ggc tae age ggg gga gac att tat cac 7215 
Leu Ser Gly Tirp Phe Thr Ala Gly Tyr Ser Gly Gly Asp He Tyr His 
1730 1735 1740 

age gtg tct eat gee egg cce cgc tgg ate tgg ttt tgc eta etc ctg 7263 
Ser Val Ser His Ala Arg Pro Arg Trp He Trp Phe Cys Leu Leu Leu 
1745 1750 1755 

ett get gea ggg gta ggc ate tae etc etc cce aac cga tgaaggttgg 7312 
Leu Ala Ala Gly Val Gly He Tyr Leu Leu Pro Asn Arg 
1760 1765 1770 
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ggtaaacact ccggcctaaa 
atccactacg cgttagagct 
ctgttgtttg cccctccccc 
tttcctaata aaatgaggaa 
ggggtggggt ggggcaggac 
gggagctctt ccgcttcctc 
gcggtatcag ctcactcaaa 
ggaaagaaca tgtgagcaaa 
ctggcgtttt tccataggct 
cagaggtggc gaaacccgac 
ctcgtgcgct ctcctgttcc 
tcgggaagcg tggcgctttc 
gttcgctcca agctgggctg 
tccggtaact atcgtcttga 
gccactggta acaggattag 
tggtggccta actacggcta 
ccagttacct tcggaaaaag 
agcggtggtt tttttgtttg 
gatcctttga tcttttctac 
attttggtca tgagattatc 
agttttaaat caatctaaag 
atcagtgagg cacctatctc 
cccgtcgtgt agataactac 
ataccgcgag acccacgctc 
agggccgagc gcagaagtgg 
tgccgggaag ctagagtaag 
gctacaggca tcgtggtgtc 
caacgatcaa ggcgagttac 
ggtcctccga tcgttgtcag 
gcactgcata attctcttac 
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aaaaaaaaaa 


aatctagaaa 


ggcgcgccaa 


gatatcaagg 


7372 


cgctgatcag 


cctcgactgt 


gccttctagt 


tgccagccat 


7432 


gtgccttcct 


tgaccctgga 


aggtgccact 


cccactgtcc 


7492 


attgcatcgc 


attgtctgag 


taggtgtcat 


tctattctgg 


7552 


agcaaggggg 


aggattggga 


agacaatagc 


aggcatgctg 


7612 


gctcactgac 


tcgctgcgct 


cggtcgttcg 


gctgcggcga 


7672 


ggcggtaata 


cggttatcca 


cagaatcagg 


ggataacgca 


7732 


aggccagcaa 


aaggccagga 


accgtaaaaa 


ggccgcgttg 


7792 


ccgcccccct 


gacgagcatc 


acaaaaatcg 


acgctcaagt 


7852 


aggactataa 


agataccagg 


cgtttccccc 


tggaagctcc 


7912 


gaccctgccg 


cttaccggat 


acctgtccgc 


ctttctccct 


7972 


tcaatgctca 


cgctgtaggt 


atctcagttc 


ggtgtaggtc 


8032 


tgtgcacgaa 


ccccccgttc 


agcccgaccg 


ctgcgcctta 


8092 


gtccaacccg 


gtaagacacg 


acttatcgcc 


actggcagca 


8152 


cagagcgagg 


tatgtaggcg 


gtgctacaga 


gttcttgaag 


8212 


cactagaagg 


acagtatttg 


gtatctgcgc 


tctgctgaag 


8272 


agttggtagc 


tcttgatccg 


gcaaacaaac 


caccgctggt 


8332 


caagcagcag 


attacgcgca 


gaaaaaaagg 


atctcaagaa 


8392 


ggggtctgac 


gctcagtgga 


acgaaaactc 


acgttaaggg 


8452 


aaaaaggatc 


ttcacctaga 


tccttttaaa 


ttaaaaatga 


8512 


tatatatgag 


taaacttggt 


ctgacagtta 


ccaatgctta 


8572 


agcgatctgt 


ctatttcgtt 


catccatagt 


tgcctgactc 


8632 


gatacgggag 


ggcttaccat 


ctggccccag 


tgctgcaatg 


8692 


accggctcca 


gatttatcag 


caataaacca 


gccagccgga 


8752 


tcctgcaact 


ttatccgcct 


ccatccagtc 


tattaattgt 


8812 


tagttcgcca 


gttaatagtt 


tgcgcaacgt 


tgttgccatt 


8872 


acgctcgtcg 


tttggtatgg 


cttcattcag 


ctccggttcc 


8932 


atgatccccc 


atgttgtgca 


aaaaagcggt 


tagctccttc 


8992 


aagtaagttg 


gccgcagtgt 


tatcactcat 


ggttatggca 


9052 


tgtcatgcca 


tccgtaagat 


gcttttctgt 


gactggtgag 


9112 
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tactcaacca agtcattctg agaatagtgt atgcggcgac cgagttgctc ttgcccggcg 9172 

tcaatacggg ataataccgc gccacatagc agaactttaa aagtgctcat cattggaaaa 9232 

cgttcttcgg ggcgaaaact ctcaaggatc ttaccgctgt tgagatccag ttcgatgtaa 92 92 

cccactcgtg cacccaactg atcttcagca tcttttactt tcaccagcgt ttctgggtga 9352 

gcaaaaacag gaaggcaaaa tgccgcaaaa aagggaataa gggcgacacg gaaatgttga 9412 

atactcatac tcttcctttt tcaatattat tgaagcattt atcagggtta ttgtctcatg 9472 

agcggataca tatttgaatg tatttagaaa aataaacaaa taggggttcc gcgcacattt 9532 

ccccgaaaag tgccacctga cgtctaagaa accattatta tcatgacatt aacctataaa 9592 

aataggcgta tcacgaggcc ctttcgtc 9620 



<210> 2 
<211> 1771 
<212> PRT 

<213> Hepatitis C virus 
<220> 

<223> Description of Artificial Sequence: Hepatitis C pns345 
<400> 2 

Met Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val Leu Val Leu Asn Pro 
15 10 15 

Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr Met Ser Lys Ala His 
20 25 30 

Gly lie Asp Pro Asn He Arg Thr Gly Val Arg Thr He Thr Thr Gly 
35 40 45 

Ser Pro He Thr Tyr Ser Thr Tyr Gly Lys Phe Leu Ala Asp Gly Gly 
50 55 60 

Cys Ser Gly Gly Ala Tyr Asp He He He Cys Asp Glu Cys His Ser 
65 70 75 80 

Thr Asp Ala Thr Ser He Leu Gly He Gly Thr Val Leu Asp Gin Ala 
85 90 95 

Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala Thr Ala Thr Pro Pro 
100 105 110 

Gly Ser Val Thr Val Pro His Pro Asn He Glu Glu val Ala Leu Ser 
115 120 125 

Thr Thr Gly Glu He Pro Phe Tyr Gly Lys Ala He Pro Leu Glu Val 
130 135 140 

He Lys Gly Gly Arg His Leu He Phe Cys His Ser Lys Lys Lys Cys 
145 150 155 160 

Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly He Asn Ala Val Ala 
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165 170 175 

Tyr Tyr Arg Gly Leu Asp Val Ser Val lie Pro Thr Ser Gly Asp Val 
180 185 190 

Val Val Val Ala Thr Asp Ala Leu Met Thr Gly Tyr Thr Gly Asp Phe 
195 200 205 

Asp Ser Val lie Asp Cys Asn Thr Cys Val Thr Gin Thr Val Asp Phe 
210 215 220 

Ser Leu Asp Pro Thr Phe Thr lie Glu Thr He Thr Leu Pro Gin Asp 
225 230 235 240 

Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr Gly Arg Gly Lys Pro 
245 250 255 

Gly He Tyr Arg Phe Val Ala Pro Gly Glu Arg Pro Ser Gly Met Phe 
260 265 270 

Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala Gly Cys Ala Trp Tyr 
275 280 285 

Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu Arg Ala Tyr Met Asn 
290 295 300 

Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu Glu Phe Trp Glu Gly 
305 310 315 320 

Val Phe Thr Gly Leu Thr His He Asp Ala His Phe Leu Ser Gin Thr 
325 330 335 

Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val Ala Tyr Gin Ala Thr 
340 345 350 

Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser Trp Asp Gin Met Trp 
355 360 365 

Lys Cys Leu He Arg Leu Lys Pro Thr Leu His Gly Pro Thr Pro Leu 
370 375 380 

Leu Tyr Arg : Leu Gly Ala Val Gin Asn Glu He Thr Leu Thr His Pro 
385 390 395 400 

Val Thr Lys Tyr He Met Thr Cys Met Ser Ala Asp Leu Glu Val Val 
405 410 415 

Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu Ala Ala Leu Ala Ala 
420 425 430 

Tyr Cys Leu Ser Thr Gly Cys Val Val He Val Gly Arg Val Val Leu 
435 440 445 

Ser Gly Lys Pro Ala He He Pro Asp Arg Glu Val Leu Tyr Arg Glu 
450 455 460 

Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu Pro Tyr He Glu Gin 
465 ' 470 475 480 
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Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys Ala Leu Gly Leu Leu 
485 490 495 

Gin Thr Ala Ser Arg Gin Ala Glu Val He Ala Pro Ala Val Gin Thr 
500 505 510 

Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys Hia Met Trp Asn Phe 
515 520 525 

lie Ser Gly He Gin Tyr Leu Ala Gly Leu Ser Thr Leu Pro Gly Asn 
530 535 540 

Pro Ala He Ala Ser Leu Met Ala Phe Thr Ala Ala Val Thr Ser Pro 
545 550 555 560 

Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn He Leu Gly Gly Trp Val 
565 570 575 

Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr Ala Phe Val Gly Ala 
580 585 590 

Gly Leu Ala Gly Ala Ala He Gly Ser Val Gly Leu Gly Lys Val Leu 
595 600 605 

He Asp He Leu Ala Gly Tyr Gly Ala Gly Val Ala Gly Ala Leu Val 
610 615 620 

Ala Phe Lys He Met Ser Gly Glu Val Pro Ser Thr Glu Asp Leu Val 
625 630 635 640 

Asn Leu Leu Pro Ala He Leu Ser Pro Gly Ala Leu Val Val Gly Val 
645 650 655 

Val Cys Ala Ala He Leu Arg Arg His Val Gly Pro Gly Glu Gly Ala 
660 665 670 

Val Gin Trp Met Asn Arg Leu He Ala Phe Ala Ser Arg Gly Asn His 
675 680 685 

Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp Ala Ala Ala Arg Val 
690 695 700 

Thr Ala He Leu Ser Ser Leu Thr Val Thr Gin Leu Leu Arg Arg Leu 
705 710 715 720 

His Gin Trp He Ser Ser Glu Cys Thr Thr Pro Cys Ser Gly Ser Trp 
725 730 735 

Leu Arg Asp He Trp Asp Trp He Cys Glu Val Leu Ser Asp Phe Lys 
740 745 750 

Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu Pro Gly He Pro Phe 
755 760 765 

Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp Arg Gly Asp Gly He 
770 775 780 

Met His Thr Arg Cys His Cys Gly Ala Glu He Thr Gly His Val Lys 
785 790 795 800 
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Asn Gly Thr Met Arg lie Val Gly Pro Arg Thr Cys Arg Asn Met Trp 
805 810 815 

Ser Gly Thr Phe Pro lie Asn Ala Tyr Thr Thr Gly Pro Cys Thr Pro 
820 825 630 

Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp Arg Val Ser Ala Glu 
835 840 845 

Glu Tyr Val Glu lie Arg Gin Val Gly Asp Phe His Tyr Val Thr Gly 
850 855 860 

Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin Val Pro Ser Pro Glu 
865 870 875 880 

Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His Arg Phe Ala Pro Pro 
885 890 895 

Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe Arg Val Gly Leu His 
900 905 910 

Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu Pro Glu Pro Asp Val 
915 920 925 

Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser His lie Thr Ala Glu 
930 935 940 

Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro Pro Ser Val Ala Ser 
945 950 955 960 

Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu Lys Ala Thr Cys Thr 
965 970 975 

Ala Asn His Asp Ser Pro Asp Ala Glu Leu lie Glu Ala Asn Leu Leu 
980 985 990 

Trp Arg Gin Glu Met Gly Gly Asn lie Thr Arg Val Glu Ser Glu Asn 
995 1000 1005 

Lys Val Val He Leu Asp Ser Phe Asp Pro Leu Val Ala Glu Glu Asp 
1010 1015 1020 

Glu Arg Glu He Ser Val Pro Ala Glu He Leu Arg Lys Ser Arg Arg 
025 1030 1035 1040 

Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro Asp Tyr Asn Pro Pro 
1045 1050 1055 

Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu Pro Pro Val Val His 
1060 1065 1070 

Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro Val Pro Pro Pro Arg 
1075 1080 1085 

Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr Leu Ser Thr Ala Leu 
1090 1095 1100 

Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser Ser Thr Ser Gly He 
105 1110 1115 1120 
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Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro Ala Pro Ser Gly Cys 
1125 1130 1135 

Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser Met Pro Pro Leu Glu 
1140 1145 1150 

Gly Glii Pro Gly Asp Pro Asp Leu Ser Asp Gly Ser Trp Ser Thr Val 
1155 1160 1165 

Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys Cys Ser Met Ser Tyr 
1170 1175 1180 

Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala Ala Glu Glu Gin Lys 
185 1190 1195 1200 

Leu Pro He Asn Ala Leu Ser Asn Ser Leu Leu Arg His His Asn Leu 
1205 1210 1215 

Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin Arg Gin Lys Lys Val 
1220 1225 1230 

Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His Tyr Gin Asp Val Leu 
1235 1240 1245 

Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys Ala Asn Leu Leu Ser 
1250 1255 1260 

Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His Ser Ala Lys Ser Lys 
265 1270 1275 1280 

Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His Ala Arg Lys Ala Val 
1285 1290 1295 

Thr His He Asn Ser Val Trp Lys Asp Leu Leu Glu Asp Asn Val Thr 
1300 1305 1310 

Pro He Asp Thr Thr He Met Ala Lys Asn Glu Val Phe Cys Val Gin 
1315 1320 1325 

Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu He Val Phe Pro Asp 
1330 1335 1340 

Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu Tyr Asp Val Val Thr 
345 1350 1355 1360 

Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr Gly Phe Gin Tyr Ser 
1365 1370 1375 

Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala Trp Lys Ser Lys Lys 
1380 1385 1390 

Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys Phe Asp Ser Thr Val 
1395 1400 1405 

Thr Glu Ser Asp He Arg Thr Glu Glu Ala He Tyr Gin Cys Cys Asp 
1410 1415 1420 

Leu Asp Pro Gin Ala Arg Val Ala He Lys Ser Leu Thr Glu Arg Leu 
425 1430 1435 1440 
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Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly Glu Asn Cys Gly Tyr 
1445 1450 1455 

Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr Ser Cys Gly Asn Thr 
1460 1465 1470 

Leu Thr Cys Tyr lie Lys Ala Arg Ala Ala Cys Arg Ala Ala Gly Leu 
1475 1480 1485 

Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp Leu Val Val lie Cys 
1490 1495 1500 

Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser Leu Arg Ala Phe Thr 
505 1510 1515 1520 

Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly Asp Pro Pro Gin Pro 
1525 1530 1535 

Glu Tyr Asp Leu Glu Leu lie Thr Ser Cys Ser Ser Asn Val Ser Val 
1540 1545 1550 

Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr Leu Thr Arg Asp Pro 
1555 1560 1565 

Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr Ala Arg His Thr Pro 
1570 1575 1580 

Val Asn Ser Trp Leu Gly Asn lie lie Met Phe Ala Pro Thr Leu Trp 
585 1590 1595 1600 

Ala Arg Met lie Leu Met Thr His Phe Phe Ser Val Leu lie Ala Arg 
1605 1610 1615 

Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu lie Tyr Gly Ala Cys Tyr 
1620 1625 1630 

Ser lie Glu Pro Leu Asp Leu Pro Pro lie lie Gin Arg Leu His Gly 
1635 1640 1645 

Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro Gly Glu lie Asn Arg 
1650 1655 1660 

Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro Pro Leu Arg Ala Trp 
665 1670 1675 1680 

Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu Leu Ala Arg Gly Gly 
1685 1690 1695 

Arg Ala Ala lie Cys Gly Lys Tyr Leu Phe Asn Trp Ala Val Arg Thr 
1700 1705 1710 

Lys Leu Lys Leu Thr Pro lie Ala Ala Ala Gly Gin Leu Asp Leu Ser 
1715 1720 1725 

Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp lie Tyr His Ser Val 
1730 1735 1740 

Ser His Ala Arg Pro Arg Trp He Trp Phe Cys Leu Leu Leu Leu Ala 
745 1750 1755 1760 
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Ala Gly Val Gly lie Tyr Leu Leu Pro Asn Arg 
1765 1770 



<210> 3 

<211> 9620 

<212> DNA 

<213> Artificial Sequence 

<220> 

<221> CDS 

<222> (1990) . . (7302) 
<220> 

<223> Description of Artificial Sequence: pDeltaNS3NS5 



<400> 3 
cgcgcgtttc 


ggtgatgacg 


gtgaaaacct 


ctgacacatg 


cagctcccgg 


agacggtcac 


60 


agcttgtctg 


taagcggatg 


ccgggagcag 


acaagcccgt 


cagggcgcgt 


cagcgggtgt 


120 


tggcgggtgt 


cggggctggc 


ttaactatgc 


ggcatcagag 


cagattgtac 


tgagagtgca 


180 


ccatatgaag 


ctttttgcaa 


aagcctaggc 


ctccaaaaaa 


gcctcctcac 


tacttctgga 


240 


atagctcaga 


ggccgaggcg 


gcctcggcct 


ctgcataaat 


aaaaaaaatt 


agtcagccat 


300 


ggggcggaga 


atgggcggaa 


ctgggcgggg 


agggaattat 


tggctattgg 


ccattgcata 


360 


cgttgtatct 


atatcataat 


atgtacattt 


atattggctc 


atgtccaata 


tgaccgccat 


420 


gttgacattg 


attattgact 


agttattaat 


agtaatcaat 


tacggggtca 


ttagttcata 


480 


gcccatatat 


ggagttccgc 


gttacataac 


ttacggtaaa 


tggcccgcct 


ggctgaccgc 


540 


ccaacgaccc 


ccgcccattg 


acgtcaataa 


tgacgtatgt 


tcccatagta 


acgccaatag 


600 


ggactttcca 


ttgacgtcaa 


tgggtggagt 


atttacggta 


aactgcccac 


ttggcagtac 


660 


atcaagtgta 


tcatatgcca 


agtccgcccc 


ctattgacgt 


caatgacggt 


aaatggcccg 


720 


cctggcatta 


tgcccagtac 


atgaccttac 


gggactttcc 


tacttggcag 


tacatctacg 


780 


tattagtcat 


cgctattacc 


atggtgatgc 


ggttttggca 


gtacaccaat 


gggcgtggat 


840 


agcggtttga 


ctcacgggga 


tttccaagtc 


tccaccccat 


tgacgtcaat 


gggagtttgt 


900 


tttggcacca 


aaatcaacgg 


gactttccaa 


aatgtcgtaa 


taaccccgcc 


ccgttgacgc 


960 


aaatgggcgg 


taggcgtgta 


cggtgggagg 


tctatataag 


cagagctcgt 


ttagtgaacc 


1020 


gtcagatcgc 


ctggagacgc 


catccacgct 


gttttgacct 


ccatagaaga 


caccgggacc 


1080 


gatccagcct 


ccgcggccgg 


gaacggtgca 


ttggaacgcg 


gattccccgt 


gccaagagtg 


1140 


acgtaagtac 


cgcctataga 


ctctataggc 


acaccccttt 


ggctcttatg 


catgctatac 


1200 


tgtttttggc 


ttggggccta 


tacacccccg 


ctccttatgc 


tataggtgat 


ggtatagctt 


1260 
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agcctatagg tgtgggttat tgaccattat tgaccactcc cctattggtg acgatacttt 1320 

ccattactaa tccataacat ggctctttgc cacaactatc tctattggct atatgccaat 1380 

actctgtcct tcagagactg acacggactc tgtattttta caggatgggg tccatttatt 1440 

atttacaaat tcacatatac aacaacgccg tcccccgtgc ccgcagtttt tattaaacat 1500 

agcgtgggat ctccgacatc tcgggtacgt gttccggaca tgggctcttc tccggtagcg 1560 

gcggagcttc cacatccgag ccctggtccc atccgtccag cggctcatgg tcgctcggca 1620 

gctccttgct cctaacagtg gaggccagac ttaggcacag cacaatgccc accaccacca 1680 

gtgtgccgca caaggccgtg gcggtagggt atgtgtctga aaatgagctc ggagattggg 174 0 

ctcgcacctg gacgcagatg gaagacttaa ggcagcggca gaagaagatg caggcagctg 1800 

agttgttgta ttctgataag agtcagaggt aactcccgtt gcggtgctgt taacggtgga 1860 

gggcagtgta gtctgagcag tactcgttgc tgccgcgcgc gccaccagac ataatagctg 1920 

acagactaac agactgttcc tttccatggg tcttttctgc agtcaccgtc gtcgacctaa 1980 

gaattcacc atg get gca tat gca get cag ggc tat aag gtg eta gta etc 2031 
Met Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val Leu Val Leu 
15 10 

aac ccc tct gtt get gca aca ctg gge ttt ggt get tac atg tec aag 2079 
Asn Pro Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr Met Ser Lys 
15 20 25 30 

get cat ggg ate gat ect aae ate agg aee ggg gtg aga aca att ace 212 7 
Ala His Gly lie Asp Pro Asn He Arg Thr Gly Val Arg Thr He Thr 
35 40 45 

act ggc age ccc ate aeg tac tec ace tac gge aag ttc ctt gee gac 2175 
Thr Gly Ser Pro He Thr Tyr Ser Thr Tyr Gly Lys Phe Leu Ala Asp 
50 55 60 

gge ggg tgc teg ggg ggc get tat gac ata ata att tgt gac gag tgc 2223 
Gly Gly Cys Ser Gly /Gly Ala Tyr Asp He He He Cys Asp Glu Cys 
65 70 75 

cae tec acg gat gee aca tee ate ttg ggc att gge act gtc ett gac 2271 
His Ser Thr Asp Ala Thr Ser He Leu Gly He Gly Thr Val Leu Asp 
80 85 90 

eaa gea gag act gcg ggg geg aga ctg gtt gtg etc gee ace gee ace 2319 
Gin Ala Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala Thr Ala Thr 
95 100 105 110 

ect ccg gge tee gte act gtg ccc cat ccc aae ate gag gag gtt get 2367 
Pro Pro Gly Ser Val Thr Val Pro His Pro Asn He Glu Glu Val Ala 
115 120 125 

Ctg tec aee ace gga gag ate ect ttt tac gge aag get ate ccc etc 2415 
Leu Ser Thr Thr Gly Glu He Pro Phe Tyr Gly Lys Ala He Pro Leu 
130 135 140 
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gaa gta ate aag ggg ggg aga cat etc ate ttc tgt cat tea aag aag 2463 
Glu Val lie Lys Gly Gly Arg His Leu lie Phe Cys His Ser Lys Lys 
145 150 155 

aag tgc gac gaa etc gee gca aag ctg gte gca ttg ggc ate aat gcc 2511 
Lys Cys Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly lie Asn Ala 
160 165 170 

gtg gee tac tac cgc ggt ctt gac gtg tec gtc ate ccg ace age gge 2559 
Val Ala Tyr Tyr Arg Gly Leu Asp Val Ser Val lie Pro Thr Ser Gly 
175 180 185 190 

gat gtt gte gte gtg gca acc gat gee etc atg ace gge tat acc ggc 2607 
Asp Val Val Val Val Ala Thr Asp Ala Leu Met Thr Gly Tyr Thr Gly 
195 200 205 

gac ttc gac teg gtg ata gae tgc aat acg tgt gtc acc cag aca gtc 2655 
Asp Phe Asp Ser Val lie Asp Cys Asn Thr Cys Val Thr Gin Thr Val 
210 215 220 

gat ttc age ctt gac cet acc ttc acc att gag aca ate aeg etc cec 2703 
Asp Phe Ser Leu Asp Pro Thr Phe Thr lie Glu Thr lie Thr Leu Pro 
225 230 235 

caa gat get gtc tec cgc act caa cgt egg ggc agg act ggc agg ggg 2751 
Gin Asp Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr Gly Arg Gly 
240 245 250 

aag cea gge ate tac aga ttt gtg gea ccg ggg gag cgc eec tec ggc 2799 
Lys Pro Gly lie Tyr Arg Phe Val Ala Pro Gly Glu Arg Pro Ser Gly 
255 260 265 270 

atg ttc gac teg tec gtc etc tgt gag tgc tat gac gca ggc tgt get 2847 
Met Phe Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala Gly Cys Ala 
275 280 285 

tgg tat gag etc acg cec gee gag act aca gtt agg eta ega gcg tac 2895 
Trp Tyr Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu Arg Ala Tyr 
290 295 300 

atg aac acc ccg ggg ctt cec gtg tgc cag gac cat ctt gaa ttt tgg 2943 
Met Asn Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu Glu Phe Trp 
305 310 315 

gag gge gtc ttt aca ggc etc act eat ata gat gcc cae ttt eta tee 2991 
Glu Gly Val Phe Thr Gly Leu Thr His He Asp Ala His Phe Leu Ser 
320 325 330 

cag aca aag cag agt ggg gag aac ctt cct tac ctg gta gcg tac caa 3039 
Gin Thr Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val Ala Tyr Gin 
335 340 345 350 

gcc acc gtg tgc get agg get caa gcc cct cec cca teg tgg gac cag 3087 
Ala Thr Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser Trp Asp Gin 
355 360 365 

atg tgg aag tgt ttg att cgc etc aag cec ace etc eat ggg cca aca 3135 
Met Trp Lys Cys Leu He Arg Leu Lys Pro Thr Leu His Gly Pro Thr 
370 375 380 
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ccc ctg eta tac aga ctg ggc get gtt eag aat gaa ate ace ctg acg 3183 
Pro Leu Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu lie Thr Leu Thr 
385 390 395 

eac eca gtc acc aaa tac ate atg aea tgc atg teg gee gae ctg gag 3231 
His Pro Val Thr Lys Tyr lie Met Thr Cys Met Ser Ala Asp Leu Glu 
400 405 410 

gtc gtc acg age aee tgg gtg etc gtt ggc ggc gtc ctg get get ttg 3279 
Val Val Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu Ala Ala Leu 
415 420 425 430 

gee gcg tat tgc ctg tea aca ggc tgc gtg gtc ata gtg ggc agg gtc 3327 
Ala Ala Tyr Cys Leu Ser Thr Gly Cys Val Val lie Val Gly Arg Val 
435 440 445 

gtc ttg tec ggg aag ccg gea ate ata cet gae agg gaa gtc etc tac 3375 
Val Leu Ser Gly Lys Pro Ala lie lie Pro Asp Arg Glu Val Leu Tyr 
450 455 460 

cga gag ttc gat gag atg gaa gag tgc tet cag eac tta ccg tac ate 3423 
Arg Glu Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu Pro Tyr lie 
465 470 475 

gag caa ggg atg atg etc gee gag cag ttc aag cag aag gee etc ggc 3471 
Glu Gin Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys Ala Leu Gly 
480 485 490 

etc ctg cag acc gcg tec cgt cag gca gag gtt ate gee cet get gtc 3519 
Leu Leu Gin Thr Ala Ser Arg Gin Ala Glu Val lie Ala Pro Ala Val 
495 500 505 510 

cag acc aac tgg caa aaa etc gag acc ttc tgg gcg aag eat atg tgg 3567 
Gin Thr Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys His Met Trp 
515 520 525 

aac ttc ate agt ggg ata caa tac ttg gcg ggc ttg tea acg ctg cet 3615 
Asn Phe lie Ser Gly lie Gin Tyr Leu Ala Gly Leu Ser Thr Leu Pro 
530 535 540 

ggt aac cec gee att get tea ttg atg get ttt aca get get gtc acc 3663 
Gly Asn Pro Ala lie Ala Ser Leu Met Ala^^^^Phe Thr Ala Ala Val Thr 
545 550 555 

age eca eta acc act age caa aee etc etc ttc aac ata ttg ggg ggg 3711 
Ser Pro Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn lie Leu Gly Gly 
560 565 570 

tgg gtg get gee eag etc gee gee ccc ggt gee get act gee ttt gtg 3759 
Trp Val Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr Ala Phe Val 
575 580 585 590 

ggc get ggc tta get ggc gee gee ate ggc agt gtt gga ctg ggg aag 3807 
Gly Ala Gly Leu Ala Gly Ala Ala lie Gly Ser Val Gly Leu Gly Lys 
595 600 605 

gtc etc ata gae ate ctt gca ggg tat ggc gcg ggc gtg gcg gga get 3855 
Val Leu He Asp He Leu Ala Gly Tyr Gly Ala Gly Val Ala Gly Ala 
610 615 620 
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ctt gtg gca ttc aag ate atg age ggt gag gtc cec tec acg gag gae 3903 
Leu Val Ala Phe Lys lie Met Ser Gly Glu Val Pro Ser Thr Glu Asp 
625 630 635 

etg gtc aat eta ctg cec gcc ate cte teg cec gga gee cte gta gtc 3951 
Leu Val Asn Leu Leu Pro Ala lie Leu Ser Pro Gly Ala Leu Val Val 
640 645 650 

ggc gtg gtc tgt gca gca ata ctg egc egg cae gtt ggc ccg ggc gag 3999 
Gly Val Val Cys Ala Ala lie Leu Arg Arg His Val Gly Pro Gly Glu 
655 660 665 670 

ggg gca gtg eag tgg atg aac egg etg ata gcc ttc gee tec egg ggg 4047 
Gly Ala Val Gin Trp Met Asn Arg Leu lie Ala Phe Ala Ser Arg Gly 
675 680 685 

aac cat gtt tec cec acg eac tac gtg ccg gag age gat gca get gcc 4095 
Asn His Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp Ala Ala Ala 
690 695 700 

cgc gtc act gcc ata etc age age etc act gta ace cag etc etg agg 4143 
Arg Val Thr Ala lie Leu Ser Ser Leu Thr Val Thr Gin Leu Leu Arg 
705 710 715 

ega ctg eac eag tgg ata age teg gag tgt ace act cea tgc tec ggt 4191 
Arg Leu His Gin Trp lie Ser Ser Glu Cys Thr Thr Pro Cys Ser Gly 
720 725 730 

tec tgg eta agg gae ate tgg gae tgg ata tgc gag gtg ttg age gac 423 9 
Ser Trp Leu Arg Asp lie Trp Asp Trp lie Cys Glu Val Leu Ser Asp 
735 740 745 750 

ttt aag acc tgg eta aaa get aag etc atg cea eag etg cet ggg ate 4287 
Phe Lys Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu Pro Gly He 
755 760 765 

cec ttt gtg tec tgc cag cgc ggg tat aag ggg gtc tgg cga ggg gac 4335 
Pro Phe Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp Arg Gly Asp 
770 775 780 

ggc ate atg eac act cgc tgc eac tgt gga get gag ate act gga cat 4383 
Gly He Met His Thr Arg Cys His Cys Gly Ala Glu He Thr Gly:'-His 
785 790 795 

gtc aaa aac ggg acg atg agg ate gtc ggt cet agg ace tgc agg aac 4431 
Val Lys Asn Gly Thr Met Arg He Val Gly Pro Arg Thr Cys Arg Asn 
800 805 810 

atg tgg agt ggg acc ttc cec att aat gcc tac ace acg ggc cec tgt 4479 
Met Trp Ser Gly Thr Phe Pro He Asn Ala Tyr Thr Thr Gly Pro Cys 
815 820 825 830 

acc cec ett cet gcg ccg aac tac acg ttc gcg eta tgg agg gtg tct 4527 
Thr Pro Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp Arg Val Ser 
835 840 845 

gca gag gaa tac gtg gag ata agg cag gtg ggg gae ttc cae tac gtg 4575 
Ala Glu Glu Tyr Val Glu He Arg Gin Val Gly Asp Phe His Tyr Val 
850 855 860 
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acg ggt atg act act gac aat ctt aaa tgc ccg tgc cag gtc cca teg 4623 
Thr Gly Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin Val Pro Ser 
865 870 875 

ccc gaa ttt ttc aca gaa ttg gac ggg gtg cgc eta cat agg ttt gcg 4671 
Pro Glu Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His Arg Phe Ala 
880 885 890 

ccc ccc tgc aag ccc ttg ctg egg gag gag gta tea ttc aga gta gga 4719 
Pro Pro Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe Arg Val Gly 
895 900 905 910 

etc cac gaa tac ccg gta ggg teg caa tta cet tgc gag ccc gaa ccg 4767 
Leu His Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu Pro Glu Pro 
915 920 925 

gac gtg gee gtg ttg acg tec atg etc act gat ccc tee eat ata aca 4 815 
Asp Val Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser Hia lie Thr 
930 935 940 

gca gag gcg gcc ggg ega agg ttg gcg agg gga tea ccc ccc tct gtg 4863 
Ala Glu Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro Pro Ser Val 
945 950 955 

gee age tec teg get age eag eta tec get cca tct etc aag gca act 4 911 
Ala Ser Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu Lys Ala Thr 
960 965 970 

tgc ace get aac cat gae tec cet gat get gag etc ata gag gee aac 4959 
Cys Thr Ala Asn His Asp Ser Pro Asp Ala Glu Leu lie Glu Ala Asn 
975 980 985 990 

etc eta tgg agg eag gag atg ggc gge aac ate ace agg gtt gag tea 5007 
Leu Leu Trp Arg Gin Glu Met Gly Gly Asn lie Thr Arg Val Glu Ser 
995 1000 1005 

gaa aac aaa gtg gtg att ctg gac tec ttc gat ccg ctt gtg gcg gag 5055 
Glu Asn Lys Val Val lie Leu Asp Ser Phe Asp Pro Leu Val Ala Glu 
1010 1015 1020 

gag gac gag egg gag ate tec gta ccc gca gaa ate ctg egg aag tct 5103 
Glu Asp Glu Arg Glu lie Ser Val Pro Ala Glu lie Leu Arg Lys Ser 
1025 1030 1035 

egg aga ttc gcc eag gee ctg cee gtt tgg gcg egg ccg gac tat aac 5151 
Arg Arg Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro Asp Tyr Asn 
1040 1045 1050 

ccc ccg eta gtg gag acg tgg aaa aag ccc gac tac gaa cca cet gtg 5199 
Pro Pro Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu Pro Pro Val 
1055 1060 1065 1070 

gte eat gge tgc ccg ctt cca cet cca aag tee cet cet gtg cet ccg 5247 
Val His Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro Val Pro Pro 
1075 1080 1085 

cet egg aag aag egg aeg gtg gtc etc act gaa tea ace eta tct act 5295 
Pro Arg Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr Leu Ser Thr 
1090 1095 1100 
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gcc ttg gcc gag etc gcc acc aga age ttt ggc age tec tea act tec 5343 • 
Ala Leu Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser Ser Thr Ser 
1105 1110 1115 

ggc att acg ggc gac aat acg aca aca tec tct gag ccc gcc cct tct 5391 
Gly lie Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro Ala Pro Ser 
1120 1125 1130 

ggc tgc ccc ccc gac tee gac get gag tec tat tec tec atg ccc ccc 5439 
Gly Cys Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser Met Pro Pro 
1135 1140 1145 1150 

ctg gag ggg gag cct ggg gat ccg gat ctt age gac ggg tea tgg tea 5487 
Leu Glu Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly Ser Trp Ser 
1155 1160 1165 

acg gtc agt agt gag gcc aac gcg gag gat gtc gtg tgc tgc tea atg 5535 
Thr Val Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys Cys Ser Met 
1170 1175 1180 

tct tac tct tgg aca ggc gca etc gtc acc ccg tgc gcc gcg gaa gaa 5583 
Ser Tyr Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala Ala Glu Glu 
1185 1190 1195 

cag aaa ctg ccc ate aat gca eta age aac teg ttg eta egt cae cac 5631 
Gin Lys Leu Pro lie Asn Ala Leu .Ser Asn Ser Leu Leu Arg His His 
1200 1205 1210 

aat ttg gtg tat tec acc acc tea cgc agt get tgc caa agg cag aag 5679 
Asn Leu Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin Arg Gin Lys 
1215 1220 1225 1230 

aaa gtc aca ttt gac aga ctg caa gtt ctg gac age cat tac cag gac 5727 
Lys Val Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His Tyr Gin Asp 
1235 1240 1245 

gta etc aag gag gtt aaa gca gcg gcg tea aaa gtg aag get aac ttg 5775 
Val Leu Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys Ala Asn Leu 
1250 1255 1260 

eta tec gta gag gaa get tgc age ctg acg ccc eca cac tea gcc aaa 5823 
Leu Ser Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His Ser Ala Lys 
1265 1270 1275 

tec aag ttt ggt tat ggg gca aaa gac gtc cgt tgc cat gcc aga aag 5871 
Ser Lys Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His Ala Arg Lys 
1280 1285 1290 

gcc gta ace cac ate aac tec gtg tgg aaa gac ctt ctg gaa gac aat 5919 
Ala Val Thr His He Asn Ser Val Trp Lys Asp Leu Leu Glu Asp Asn 
1295 1300 1305 1310 

gta aca eca ata gac act ace ate atg get aag aac gag gtt ttc tgc 5967 
Val Thr Pro He Asp Thr Thr He Met Ala Lys Asn Glu Val Phe Cys 
1315 1320 1325 

gtt cag cct gag aag ggg ggt cgt aag eca get cgt etc ate gtg ttc 6015 
Val Gin Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu He Val Phe 
1330 1335 1340 
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ccc gat ctg ggc gtg cgc gtg tgc gaa aag atg get ttg tac gac gtg 6063 

Pro Asp Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu Tyr Asp Val 
1345 1350 1355 

gtt aca aag etc ccc ttg gcc gtg atg gga age tec tac gga ttc caa 6111 

Val Thr Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr Gly Phe Gin 
1360 1365 1370 

tac tea cca gga cag egg gtt gaa ttc etc gtg caa gcg tgg aag tec 6159 

Tyr Ser Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala Trp Lys Ser 
1375 1380 1385 1390 

aag aaa ace cca atg ggg ttc teg tat gat acc cgc tgc ttt gac tec 6207 

Lys Lys Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys Phe Asp Ser 
1395 1400 1405 

aca gtc act gag age gac ate cgt acg gag gag gea ate tac caa tgt 6255 

Thr Val Thr Glu Ser Asp He Arg Thr Glu Glu Ala He Tyr Gin Cys 
1410 1415 1420 

tgt gac etc gac ccc caa gee cgc gtg gcc ate aag tee etc acc gag 6303 

Cys Asp Leu Asp Pro Gin Ala Arg Val Ala He Lys Ser Leu Thr Glu 
1425 1430 1435 

agg ctt tat gtt ggg ggc cct ett acc aat tea agg ggg gag aac tgc 6351 

Arg Leu Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly Glu Asn Cys 
1440 1445 1450 

ggc tat cgc agg tgc cgc gcg age ggc gta ctg aca act age tgt ggt 6399 

Gly Tyr Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr Ser Cys Gly 
1455 1460 1465 1470 

aac acc etc act tgc tac ate aag gcc egg gea gcc tgt cga gcc gea 6447 

Asn Thr Leu Thr Cys Tyr He Lys Ala Arg Ala Ala Cys Arg Ala Ala 
1475 1480 1485 

ggg etc cag gac tgc acc atg etc gtg tgt ggc gac gac tta gtc gtt 6495 

Gly Leu Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp Leu Val Val 
1490 1495 1500 

ate tgt gaa age gcg ggg gtc cag gag gac gcg gcg age ctg aga gee 6543 

He Cys Glu Ser Ala Gly Val Glii Glu Asp Ala Ala Ser Leu Arg Ala 
1505 1510 1515 

ttc acg gag get atg acc agg tac tec gcc ccc cct ggg gac ccc cca 6591 

Phe Thr Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly Asp Pro Pro 
1520 1525 1530 

caa cca gaa tac gac ttg gag etc ata aca tea tgc tec tee aac gtg 6639 

Gin Pro Glu Tyr Asp Leu Glu Leu He Thr Ser Cys Ser Ser Asn Val 
1535 1540 1545 1550 

tea gtc gee eac gac ggc get gga aag agg gtc tac tac etc acc cgt 6687 

Ser Val Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr Leu Thr Arg 
1555 1560 1565 

gac cct aca acc ccc etc gcg aga get gcg tgg gag aca gea aga cac 6735 

Asp Pro Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr Ala Arg His 
1570 1575 1580 
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act cca gtc aat tec tgg eta ggc aac ata ate atg ttt gcc ccc aca 6783 

Thr Pro Val Asn Ser Trp Leu Gly Asn lie lie Met Phe Ala Pro Thr 
1585 1590 1595 



ctg tgg gcg agg atg ata etg atg ace cat ttc ttt age gtc ctt ata 6831 
Leu Trp Ala Arg Met He Leu Met Thr His Phe Phe Ser Val Leu He 
1600 1605 1610 

gcc agg gae cag ctt gaa cag gee etc gat tge gag ate tac ggg gcc 6879 
Ala Arg Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu He Tyr Gly Ala 
1615 1620 1625 1630 

tge tac tec ata gaa cca ctg gat eta cct cca ate att caa aga etc 6927 
Cys Tyr Ser He Glu Pro Leu Asp Leu Pro Pro He He Gin Arg Leu 
1635 1640 1645 

cat ggc etc age gca ttt tea etc cac agt tac tct cca ggt gaa ate 6975 
His Gly Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro Gly Glu He 
1650 1655 1660 

aat agg gtg gcc gca tge etc aga aaa ctt ggg gta ccg ccc ttg cga 7023 
Asn Arg Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro Pro Leu Arg 
1665 1670 1675 

get tgg aga cac egg gee egg age gtc cgc get agg ctt ctg gee aga 7071 
Ala Trp Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu Leu Ala Arg 
1680 1685 1690 

99^ 9gc agg get gee ata tgt ggc aag tac etc ttc aac tgg gca gta 7119 
Gly Gly Arg Ala Ala He Cys Gly Lys Tyr Leu Phe Asn Trp Ala Val 
1695 1700 1705 1710 

aga aca aag etc aaa etc act cca ata gcg gee get ggc cag ctg gae 7167 
TVrg Thr Lys Leu Lys Leu Thr Pro He Ala Ala Ala Gly Gin Leu Asp 
1715 1720 1725 

ttg tee ggc tgg ttc aeg get ggc tac age ggg gga gae att tat cac 7215 
Leu Ser Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp He Tyr His 
1730 1735 1740 

age gtg tct cat gcc egg ccc cgc tgg ate tgg ttt tge eta etc ctg 7263 
Ser Val Ser Hia^Ala Arg Pro Arg Trp He Trp Phe Cys Leu Leu Leu 
1745 1750 1755 



ctt get gca ggg gta ggc ate tac etc etc ccc aac cga tgaaggttgg 7312 
Leu Ala Ala Gly Val Gly He Tyr Leu Leu Pro Asn Arg 



1760 




1765 


1770 






ggtaaacact 


ccggcctaaa 


aaaaaaaaaa 


aatctagaaa 


ggcgcgccaa 


gatatcaagg 


7372 


ateeaetaeg 


cgttagagct 


C9Ctgatcag 


ectcgaetgt 


gecttctagt 


tgceagecat 


7432 


ctgttgtttg 


cecctccece 


gtgccttcct 


tgaccctgga 


aggtgecact 


cceaetgtcc 


7492 


tttcctaata 


aaatgaggaa 


attgcatcgc 


attgtctgag 


taggtgteat 


tetattetgg 


7552 


ggggtggggt 


ggggcaggac 


agcaaggggg 


aggattggga 


agacaatage 


aggcatgetg 


7612 


gggagctett 


ecgettcetc 


gctcactgae 


tcgctgegct 


cggtcgttcg 


getgcggcga 


7672 
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gcggtatcag ctcactcaaa ggcggtaata cggttatcca cagaatcagg ggataacgca 7732 
ggaaagaaca tgtgagcaaa aggccagcaa aaggccagga accgtaaaaa ggccgcgttg 7792 
ctggcgtttt tccataggct ccgcccccct gacgagcatc acaaaaatcg acgctcaagt 7852 
cagaggtggc gaaacccgac aggactataa agataccagg cgtttccccc tggaagctcc 7912 
ctcgtgcgct ctcctgttcc gaccctgccg cttaccggat acctgtccgc ctttctccct 7972 
tcgggaagcg tggcgctttc tcaatgctca cgctgtaggt atctcagttc ggtgtaggtc 8032 
gttcgctcca agctgggctg tgtgcacgaa ccccccgttc agcccgaccg ctgcgcctta 8092 
tccggtaact atcgtcttga gtccaacccg gtaagacacg acttatcgcc actggcagca 8152 
gccactggta acaggattag cagagcgagg tatgtaggcg gtgctacaga gttcttgaag 8212 
tggtggccta actacggcta cactagaagg acagtatttg gtatctgcgc tctgctgaag 8272 
ccagttacct tcggaaaaag agttggtagc tcttgatccg gcaaacaaac caccgctggt 8332 
agcggtggtt tttttgtttg caagcagcag attacgcgca gaaaaaaagg atctcaagaa 8392 
gatcctttga tcttttctac ggggtctgac gctcagtgga acgaaaactc acgttaaggg 8452 
attttggtca tgagattatc aaaaaggatc ttcacctaga tccttttaaa ttaaaaatga 8512 
agttttaaat caatctaaag tatatatgag taaacttggt ctgacagtta ccaatgctta 8572 
atcagtgagg cacctatctc agcgatctgt ctatttcgtt catccatagt tgcctgactc 8632 
cccgtcgtgt agataactac gatacgggag ggcttaccat ctggccccag tgctgcaatg 8692 
ataccgcgag acccacgctc accggctcca gatttatcag caataaacca gccagccgga 8752 
agggccgagc gcagaagtgg tcctgcaact ttatccgcct ccatccagtc tattaattgt 8812 
tgccgggaag ctagagtaag tagttcgcca gttaatagtt tgcgcaacgt tgttgccatt 8872 
gctacaggca tcgtggtgtc acgctcgtcg tttggtatgg cttcattcag ctccggttcc 8932 
caacgatcaa ggcgagttac atgatccccc atgttgtgca aaaaagcggt tagctccttc 8992 
ggtcctccga tcgttgtcag aagtaagttg gccgcagtgt tatcactcat ggttatggca 9052 
gcactgcata attctcttac tgtcatgcca tccgtaagat gcttttctgt gactggtgag 9112 
tactcaacca agtcattctg agaatagtgt atgcggcgac cgagttgctc ttgcccggcg 9172 
tcaatacggg ataataccgc gccacatagc agaactttaa aagtgctcat cattggaaaa 9232 
cgttcttcgg ggcgaaaact ctcaaggatc ttaccgctgt tgagatccag ttcgatgtaa 9292 
cccactcgtg cacccaactg atcttcagca tcttttactt tcaccagcgt ttctgggtga 9352 
gcaaaaacag gaaggcaaaa tgccgcaaaa aagggaataa gggcgacacg gaaatgttga 9412 
atactcatac tcttcctttt tcaatattat tgaagcattt atcagggtta ttgtctcatg 9472 
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agcggataca tatttgaatg tatttagaaa aataaacaaa taggggttcc gcgcacattt 9532 
ccccgaaaag tgccacctga cgtctaagaa accattatta tcatgacatt aacctataaa 9592 



<210> 4 
<211> 1771 
<212> PRT 

<213> Artificial Sequence 
<400> 4 

Met Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val Leu Val Leu Asn Pro 
15 10 15 

Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr Met Ser Lys Ala His 
20 25 30 

Gly lie Asp Pro Asn lie Arg Thr Gly Val Arg Thr lie Thr Thr Gly 
35 40 45 

Ser Pro lie Thr Tyr Ser Thr Tyr Gly Lys Phe Leu Ala Asp Gly Gly 
50 55 60 

Cys Ser Gly Gly Ala Tyr Asp lie lie He Cys Asp Glu Cys His Ser 
65 70 75 80 

Thr Asp Ala Thr Ser He Leu Gly He Gly Thr Val Leu Asp Gin Ala 
85 90 95 

Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala Thr Ala Thr Pro Pro 
100 105 110 

Gly Ser Val Thr Val Pro His Pro Asn He Glu Glu Val Ala Leu Ser 
115 120 125 

Thr Thr Gly Glu He Pro Phe Tyr Gly Lys Ala He Pro Leu Glu Val 
130 135 140 

He Lys Gly Gly Arg His Leu He Phe Cys His Ser Lys Lys Lys Cys 

145 ^v^:150 155 160 

Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly He Asn Ala Val Ala 
165 170 175 

Tyr Tyr Arg Gly Leu Asp Val Ser Val He Pro Thr Ser Gly Asp Val 
180 185 190 

Val Val Val Ala Thr Asp Ala Leu Met Thr Gly Tyr Thr Gly Asp Phe 
195 200 205 

Asp Ser Val He Asp Cys Asn Thr Cys Val Thr Gin Thr Val Asp Phe 
210 215 220 

Ser Leu Asp Pro Thr Phe Thr He Glu Thr He Thr Leu Pro Gin Asp 
225 230 235 240 

Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr Gly Arg Gly Lys Pro 



aataggcgta 



tcacgaggcc ctttcgtc 



9620 
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245 



250 



255 



Gly He Tyr Arg Phe Val Ala 
260 



Pro Gly Glu Arg Pro Ser Gly Met Phe 
265 270 



Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala Gly Cys Ala Trp Tyr 
275 280 285 

Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu Arg Ala Tyr Met Asn 
290 295 300 

Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu Glu Phe Trp Glu Gly 
305 310 315 320 

Val Phe Thr Gly Leu Thr His He Asp Ala His Phe Leu Ser Gin Thr 
325 330 335 

Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val Ala Tyr Gin Ala Thr 
340 345 350 

Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser Trp Asp Gin Met Trp 
355 360 365 

Lys Cys Leu lie Arg Leu Lys Pro Thr Leu His Gly Pro Thr Pro Leu 
370 375 380 

Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu He Thr Leu Thr His Pro 
385 390 395 400 

Val Thr Lys Tyr He Met Thr Cys Met Ser Ala Asp Leu Glu Val Val 
405 410 415 

Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu Ala Ala Leu Ala Ala 
420 425 430 

Tyr Cys Leu Ser Thr Gly Cys Val Val He Val Gly Arg Val Val Leu 
435 440 445 

Ser Gly Lys Pro Ala He He Pro Asp Arg Glu Val Leu Tyr Arg Glu 
450 455 460 

Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu Pro Tyr He Glu Gin 
465 470 475 480 

Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys Ala Leu Gly Leu Leu 
485 490 495 

Gin Thr Ala Ser Arg Gin Ala Glu Val He Ala Pro Ala Val Gin Thr 
500 505 510 

Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys His Met Trp Asn Phe 
515 520 525 

He Ser Gly He Gin Tyr Leu Ala Gly Leu Ser Thr Leu Pro Gly Asn 
530 535 540 

Pro Ala He Ala Ser Leu Met Ala Phe Thr Ala Ala Val Thr Ser Pro 
545 550 555 560 
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Leu Thr Thr Ser Gin Thr Leu Leu Phe Aan lie Leu Gly Gly Trp Val 
565 570 575 

Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr Ala Phe Val Gly Ala 
580 585 590 

Gly Leu Ala Gly Ala Ala He Gly Ser Val Gly Leu Gly Lys Val Leu 
595 600 605 

He Asp He Leu Ala Gly Tyr Gly Ala Gly Val Ala Gly Ala Leu Val 
610 615 620 

Ala Phe Lys He Met Ser Gly Glu Val Pro Ser Thr Glu Asp Leu Val 
625 630 635 640 

Asn Leu Leu Pro Ala He Leu Ser Pro Gly Ala Leu Val Val Gly Val 
645 650 655 

Val Cys Ala Ala He Leu Arg Arg His Val Gly Pro Gly Glu Gly Ala 
660 665 670 

Val Gin Trp Met Asn Arg Leu He Ala Phe Ala Ser Arg Gly Asn His 
675 680 685 

Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp Ala Ala Ala Arg Val 
690 695 700 

Thr Ala He Leu Ser Ser Leu Thr Val Thr Gin Leu Leu Arg Arg Leu 
705 710 715 720 

His Gin Trp He Ser Ser Glu Cys Thr Thr Pro Cys Ser Gly Ser Trp 
725 730 735 

Leu Arg Asp He Trp Asp Trp He Cys Glu Val Leu Ser Asp Phe Lys 
740 745 750 

Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu Pro Gly He Pro Phe 
755 760 765 

Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp Arg Gly Asp Gly lie 
770 775 780 

x:}:J\^ 

Met His Thr Arg Cys His Cys Gly Ala Glu He Thr Gly His Val Lys 
785 790 795 800 

Asn Gly Thr Met Arg He Val Gly Pro Arg Thr Cys Arg Asn Met Trp 
805 810 815 

Ser Gly Thr Phe Pro He Asn Ala Tyr Thr Thr Gly Pro Cys Thr Pro 
820 825 830 

Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp Arg Val Ser Ala Glu 
835 840 845 

Glu Tyr Val Glu He Arg Gin Val Gly Asp Phe His Tyr Val Thr Gly 
850 855 860 

Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin Val Pro Ser Pro Glu 
865 870 875 880 
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Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His Arg Phe Ala Pro Pro 
885 890 895 

Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe Arg Val Gly Leu His 
900 905 910 

Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu Pro Glu Pro Asp Val 
915 920 925 

Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser His lie Thr Ala Glu 
930 935 940 

Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro Pro Ser Val Ala Ser 
945 950 955 960 

Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu Lys Ala Thr Cys Thr 
965 970 975 

Ala Asn His Asp Ser Pro Asp Ala Glu Leu lie Glu Ala Asn Leu Leu 
980 985 990 

Trp Arg Gin Glu Met Gly Gly Asn lie Thr Arg Val Glu Ser Glu Asn 
995 1000 1005 

Lys Val Val lie Leu Asp Ser Phe Asp Pro Leu Val Ala Glu Glu Asp 
1010 1015 1020 

Glu Arg Glu lie Ser Val Pro Ala Glu lie Leu Arg Lys Ser Arg Arg 
025 1030 1035 1040 

Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro Asp Tyr Asn Pro Pro 
1045 1050 1055 

Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu Pro Pro Val Val His 
1060 1065 1070 

Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro Val Pro Pro Pro Arg 
1075 1080 1085 

Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr Leu Ser Thr Ala Leu 
1090 1095 1100 

Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser Ser Thr Ser Gly lie 
105 1110 1115 1120 

Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro Ala Pro Ser Gly Cys 
1125 1130 1135 

Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser Met Pro Pro Leu Glu 
1140 1145 1150 

Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly Ser Trp Ser Thr Val 
1155 1160 1165 

Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys Cys Ser Met Ser Tyr 
1170 1175 1180 



Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala Ala Glu Glu Gin Lys 
185 1190 1195 1200 
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Leu Pro lie Asn Ala Leu Ser Asn Ser Leu Leu Arg His His Asn Leu 
1205 1210 1215 

Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin Arg Gin Lys Lys Val 
1220 1225 1230 

Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His Tyr Gin Asp Val Leu 
1235 1240 1245 

Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys Ala Asn Leu Leu Ser 
1250 1255 1260 

Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His Ser Ala Lys Ser Lys 
265 1270 1275 1280 

Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His Ala Arg Lys Ala Val 
1285 1290 1295 

Thr His lie Asn Ser Val Trp Lys Asp Leu Leu Glu Asp Asn Val Thr 
1300 1305 1310 

Pro lie Asp Thr Thr lie Met Ala Lys Asn Glu Val Phe Cys Val Gin 
1315 1320 1325 

Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu lie* Val Phe Pro Asp 
1330 1335 1340 

Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu Tyr Asp Val Val Thr 
345 1350 1355 1360 

Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr Gly Phe Gin Tyr Ser 
1365 1370 1375 

Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala Trp Lys Ser Lys Lys 
1380 1385 1390 

Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys Phe Asp Ser Thr Val 
1395 1400 1405 

Thr Glu Ser Asp lie Arg Thr Glu Glu Ala He Tyr Gin Cys Cys Asp 
1410 1415 1420 

Leu Asp Pro Gin Ala Arg Val Ala He Lys Ser Leu Thr Glu Arg Leu 
425 1430 1435 1440 

Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly Glu Asn Cys Gly Tyr 
1445 1450 1455 

Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr Ser Cys Gly Asn Thr 
1460 1465 1470 

Leu Thr Cys Tyr lie Lys Ala Arg Ala Ala Cys Arg Ala Ala Gly Leu 
1475 1480 1485 

Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp Leu Val Val He Cys 
1490 1495 1500 

Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser Leu Arg Ala Phe Thr 
505 1510 1515 1520 
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Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly Asp Pro Pro Gin Pro 
1525 1530 1535 

Glu Tyr Asp Leu Glu Leu He Thr Ser Cys Ser Ser Asn Val Ser Val 
1540 1545 1550 

Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr Leu Thr Arg Asp Pro 
1555 1560 1565 

Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr Ala Arg His Thr Pro 
1570 1575 1580 

Val Asn Ser Trp Leu Gly Asn He He Met Phe Ala Pro Thr Leu Trp 
585 1590 ' 1595 1600 

Ala Arg Met He Leu Met Thr His Phe Phe Ser Val Leu He Ala Arg 
1605 1610 1615 

Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu He Tyr Gly Ala Cys Tyr 
1620 1625 1630 

Ser He Glu Pro Leu Asp Leu Pro Pro He He Gin Arg Leu His Gly 
1635 1640 1645 

Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro Gly Glu He Asn Arg 
1650 1655 1660 

Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro Pro Leu Arg Ala Trp 
665 1670 1675 1680 

Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu Leu Ala Arg Gly Gly 
1685 1690 1695 

Arg Ala Ala He Cys Gly Lys Tyr Leu Phe Asn Trp Ala Val Arg Thr 
1700 1705 1710 

Lys Leu Lys Leu Thr Pro He Ala Ala Ala Gly Gin Leu Asp Leu Ser 
1715 1720 1725 

Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp He Tyr His Ser Val 
1730 1735 1740 

Ser His Ala Arg Pro Arg Trp He Trp Phe Cys Leu Leu Leu Leu Ala 
745 1750 1755 1760 

Ala Gly Val Gly He Tyr Leu Leu Pro Asn Arg 
1765 1770 



<210> 5 

<211> 4282 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: pCMVII 

<400> 5 

tcgcgcgttt cggtgatgac ggtgaaaacc tctgacacat gcagctcccg gagacggtca 60 
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cagcttgtct gtaagcggat gccgggagca gacaagcccg tcagggcgcg tcagcgggtg 120 
ttggcgggtg tcggggctgg cttaactatg cggcatcaga gcagattgta ctgagagtgc 180 
accatatgaa gctttttgca aaagcctagg cctccaaaaa agcctcctca ctacttctgg 240 
aatagctcag aggccgaggc ggcctcggcc tctgcataaa taaaaaaaat tagtcagcca 300 
tggggcggag aatgggcgga actgggcggg gagggaatta ttggctattg gccattgcat 360 
acgttgtatc tatatcataa tatgtacatt tatattggct catgtccaat atgaccgcca 420 
tgttgacatt gattattgac tagttattaa tagtaatcaa ttacggggtc attagttcat 480 
agcccatata tggagttccg cgttacataa cttacggtaa atggcccgcc tggctgaccg 540 
cccaacgacc cccgcccatt gacgtcaata atgacgtatg ttcccatagt aacgccaata 600 
gggactttcc attgacgtca atgggtggag tatttacggt aaactgccca cttggcagta 660 
catcaagtgt atcatatgcc aagtccgccc cctattgacg tcaatgacgg taaatggccc 720 
gcctggcatt atgcccagta catgacctta cgggactttc ctacttggca gtacatctac 780 
gtattagtca tcgctattac catggtgatg cggttttggc agtacaccaa tgggcgtgga 840 
tagcggtttg actcacgggg atttccaagt ctccacccca ttgacgtcaa tgggagtttg 900 
ttttggcacc aaaatcaacg ggactttcca aaatgtcgta ataaccccgc cccgttgacg 960 
caaatgggcg gtaggcgtgt acggtgggag gtctatataa gcagagctcg tttagtgaac 1020 
cgtcagatcg cctggagacg ccatccacgc tgttttgacc tccatagaag acaccgggac 1080 
cgatccagcc tccgcggccg ggaacggtgc attggaacgc ggattccccg tgccaagagt 1140 
gacgtaagta ccgcctatag actctatagg cacacccctt tggctcttat gcatgctata 12 00 
ctgtttttgg cttggggcct atacaccccc gcttccttat gctataggtg atggtatagc 1260 
ttagcctata ggtgtgggtt attgaccatt attgaccact cccctattgg tgacgatact 1320 
ttccattact aatccataac atggctcttt gccacaacta tctctattgg ctatatgcca 1380 
atactctgtc cttcagagac tgacacggac tctgtatttt tacaggatgg ggtcccattt 1440 
attatttaca aattcacata tacaacaacg ccgtcccccg tgcccgcagt ttttattaaa 1500 
catagcgtgg gatctccacg cgaatctcgg gtacgtgttc cggacatggg ctcttctccg 1560 
gtagcggcgg agcttccaca tccgagccct ggtcccatgc ctccagcggc tcatggtcgc 1620 
tcggcagctc cttgctccta acagtggagg ccagacttag gcacagcaca atgcccacca 1680 
ccaccagtgt gccgcacaag gccgtggcgg tagggtatgt gtctgaaaat gagctcggag 1740 
attgggctcg caccgctgac gcagatggaa gacttaaggc agcggcagaa gaagatgcag 1800 
gcagctgagt tgttgtattc tgataagagt cagaggtaac tcccgttgcg gtgctgttaa 1860 
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cggtggaggg cagtgtagtc tgagcagtac 
atagctgaca gactaacaga ctgttccttt 
gacctaagaa ttcagactcg agcaagtcta 
tacgcgttag agctcgctga tcagcctcga 
tttgcccctc ccccgtgcct tccttgaccc 
aataaaatga ggaaattgca tcgcattgtc 
g99tggggca ggacagcaag ggggaggatt 
tcttccgctt cctcgctcac tgactcgctg 
tcagctcact caaaggcggt aatacggtta 
aacatgtgag caaaaggcca gcaaaaggcc 
tttttccata ggctccgccc ccctgacgag 
tggcgaaacc cgacaggact ataaagatac 
cgctctcctg ttccgaccct gccgcttacc 
agcgtggcgc tttctcaatg ctcacgctgt 
tccaagctgg gctgtgtgca cgaacccccc 
aactatcgtc ttgagtccaa cccggtaaga 
ggtaacagga ttagcagagc gaggtatgta 
cctaactacg gctacactag aaggacagta 
accttcggaa aaagagttgg tagctcttga 
ggtttttttg tttgcaagca gcagattacg 
ttgatctttt ctacggggtc tgacgctcag 
gtcatgagat tatcaaaaag gatcttcacc 
aaatcaatct aaagtatata tgagtaaact 
gaggcaccta tctcagcgat ctgtctattt 
gtgtagataa ctacgatacg ggagggctta 
cgagacccac gctcaccggc tccagattta 
gagcgcagaa gtggtcctgc aactttatcc 
gaagctagag taagtagttc gccagttaat 
ggcatcgtgg tgtcacgctc gtcgtttggt 
tcaaggcgag ttacatgatc ccccatgttg 



tcgttgctgc cgcgcgcgcc accagacata 1920 
ccatgggtct tttctgcagt caccgtcgtc 1980 
gaaaggcgcg ccaagatatc aaggatccac 2040 
ctgtgccttc tagttgccag ccatctgttg 2100 
tggaaggtgc cactcccact gtcctttcct 2160 
tgagtaggtg tcattctatt ctggggggtg 2220 
gggaagacaa tagcaggcat gctggggagc 2280 
cgctcggtcg ttcggctgcg gcgagcggta 2340 
tccacagaat caggggataa cgcaggaaag 2400 
aggaaccgta aaaaggccgc gttgctggcg 24 60 
catcacaaaa atcgacgctc aagtcagagg 2520 
caggcgtttc cccctggaag ctccctcgtg 2580 
ggatacctgt ccgcctttct cccttcggga 264 0 
aggtatctca gttcggtgta ggtcgttcgc 2700 
gttcagcccg accgctgcgc cttatccggt 2760 
cacgacttat cgccactggc agcagccact 2820 
ggcggtgcta cagagttctt gaagtggtgg 2880 
tttggtatct gcgctctgct gaagccagtt 2940 
tccggcaaac aaaccaccgc tggtagcggt 3000 
cgcagaaaaa aaggatctca agaagatcct 3060 
tggaacgaaa actcacgtta agggattttg 3120 
tagatccttt taaattaaaa atgaagtttt 3180 
tggtctgaca gttaccaatg cttaatcagt 3240 
cgttcatcca tagttgcctg actccccgtc 3300 
ccatctggcc ccagtgctgc aatgataccg 3360 
tcagcaataa accagccagc cggaagggcc 3420 
gcctccatcc agtctattaa ttgttgccgg 3480 
agtttgcgca acgttgttgc cattgctaca 3540 
atggcttcat tcagctccgg ttcccaacga 3600 
tgcaaaaaag cggttagctc cttcggtcct 3660 
34 
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ccgatcgttg tcagaagtaa gttggccgca gtgttatcac tcatggttat ggcagcactg 3720 
cataattctc ttactgtcat gccatccgta agatgctttt ctgtgactgg tgagtactca 3780 
accaagtcat tctgagaata gtgtatgcgg cgaccgagtt gctcttgccc ggcgtcaata 3840 
cgggataata ccgcgccaca tagcagaact ttaaaagtgc tcatcattgg aaaacgttct 3900 
tcggggcgaa aactctcaag gatcttaccg ctgttgagat ccagttcgat gtaacccact 3960 
cgtgcaccca actgatcttc agcatctttt actttcacca gcgtttctgg gtgagcaaaa 4020 
acaggaaggc aaaatgccgc aaaaaaggga ataagggcga cacggaaatg ttgaatactc 4080 
atactcttcc tttttcaata ttattgaagc atttatcagg gttattgtct catgagcgga 4140 
tacatatttg aatgtattta gaaaaataaa caaatagggg ttccgcgcac atttccccga 42 00 
aaagtgccac ctgacgtcta agaaaccatt attatcatga cattaaccta taaaaatagg 4260 
cgtatcacga ggccctttcg tc 4282 

<210> 6 
<211> 6299 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: pNS34a 

<220> 
<221> CDS 

<222> (1990) . . (4047) 
<400> 6 

cgcgcgtttc ggtgatgacg gtgaaaacct ctgacacatg cagctcccgg agacggtcac 60 
agcttgtctg taagcggatg ccgggagcag acaagcccgt cagggcgcgt cagcgggtgt 120 
tggcgggtgt cggggctggc ttaactatgc ggcatcagag cagattgtac tgagagtgca 180 
ccatatgaag ctttttgcaa aagcctaggc ctccaaaaaa gcctcctcac tacttctgga 240 
atagctcaga ggccgaggcg gcctcggcct ctgcataaat aaaaaaaatt agtcagccat 300 
ggggcggaga atgggcggaa ctgggcgggg agggaattat tggctattgg ccattgcata 360 
cgttgtatct atatcataat atgtacattt atattggctc atgtccaata tgaccgccat 420 
gttgacattg attattgact agttattaat agtaatcaat tacggggtca ttagttcata 480 
gcccatatat ggagttccgc gttacataac ttacggtaaa tggcccgcct ggctgaccgc 540 
ccaacgaccc ccgcccattg acgtcaataa tgacgtatgt tcccatagta acgccaatag 600 
ggactttcca ttgacgtcaa tgggtggagt atttacggta aactgcccac ttggcagtac 660 
atcaagtgta tcatatgcca agtccgcccc ctattgacgt caatgacggt aaatggcccg 720 

35 
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cctggcatta tgcccagtac atgaccttac gggactttcc tacttggcag tacatctacg 780 

tattagtcat cgctattacc atggtgatgc ggttttggca gtacaccaat gggcgtggat 840 

agcggtttga ctcacgggga tttccaagtc tccaccccat tgacgtcaat gggagtttgt 900 

tttggcacca aaatcaacgg gactttccaa aatgtcgtaa taaccccgcc ccgttgacgc 960 

aaatgggcgg taggcgtgta cggtgggagg tctatataag cagagctcgt ttagtgaacc 1020 

gtcagatcgc ctggagacgc catccacgct gttttgacct ccatagaaga caccgggacc 1080 

gatccagcct ccgcggccgg gaacggtgca ttggaacgcg gattccccgt gccaagagtg 1140 

acgtaagtac cgcctataga ctctataggc acaccccttt ggctcttatg catgctatac 1200 

tgtttttggc ttggggccta tacacccccg ctccttatgc tataggtgat ggtatagctt 1260 

agcctatagg tgtgggttat tgaccattat tgaccactcc cctattggtg acgatacttt 1320 

ccattactaa tccataacat ggctctttgc cacaactatc tctattggct atatgccaat 1380 

actctgtcct tcagagactg acacggactc tgtattttta caggatgggg tccatttatt 1440 

atttacaaat tcacatatac aacaacgccg tcccccgtgc ccgcagtttt tattaaacat 1500 

agcgtgggat ctccgacatc tcgggtacgt gttccggaca tgggctcttc tccggtagcg 1560 

gcggagcttc cacatccgag ccctggtccc atccgtccag cggctcatgg tcgctcggca 1620 

gctccttgct cctaacagtg gaggccagac ttaggcacag cacaatgccc accaccacca 1680 

gtgtgccgca caaggccgtg gcggtagggt atgtgtctga aaatgagctc ggagattggg 1740 

ctcgcacctg gacgcagatg gaagacttaa ggcagcggca gaagaagatg caggcagctg 1800 

agttgttgta ttctgataag agtcagaggt aactcccgtt gcggtgctgt taacggtgga 1860 

gggcagtgta gtctgagcag tactcgttgc tgccgcgcgc gccaccagac ataatagctg 1920 

acagactaac agactgttcc tttccatggg tcttttctgc agtcaccgtc gtcgacctaa 1980 

gaattcacc atg gcg ccc ate acg gcg tac gcc cag cag aca agg ggc etc 2031 
Met Ala Pro lie Thr Ala Tyr Ala Gin Gin Thr Arg Gly Leu 
15 10 

eta ggg tgc ata ate ace age eta act ggc egg gae aaa aac caa gtg 2079 
Leu Gly Cys lie lie Thr Ser Leu Thr Gly Arg Asp Lys Asn Gin Val 
15 20 25 30 

gag ggt gag gtc cag att gtg tea act get gee caa acc ttc ctg gca 2127 
Glu Gly Glu Val Gin lie Val Ser Thr Ala Ala Gin Thr Phe Leu Ala 
35 40 45 

acg tgc ate aat ggg gtg tgc tgg act gtc tac cae ggg gcc gga acg 2175 
Thr Cys lie Asn Gly Val Cys Trp Thr Val Tyr His Gly Ala Gly Thr 
50 55 60 

agg acc ate gcg tea ccc aag ggt cet gtc ate cag atg tat acc aat 2223 
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Arg Thr lie Ala Ser Pro Lys Gly Pro Val He Gin Met Tyr Thr Asn 
65 70 75 

gta gac caa gac ctt gtg ggc tgg ccc get teg caa ggt acc cgc tea 2271 
Val Asp Gin Asp Leu Val Gly Trp Pro Ala Ser Gin Gly Thr Arg Ser 
BO 85 90 

ttg aca cce tgc act tgc ggc tec teg gac ctt tac ctg gtc acg agg 2319 
Leu Thr Pro Cys Thr Cys Gly Ser Ser Asp Leu Tyr Leu Val Thr Arg 
95 100 105 110 

cac gee gat gtc att ccc gtg cgc egg egg ggt gat age agg ggc age 2367 
His Ala Asp Val He Pro Val Arg Arg Arg Gly Asp Ser Arg Gly Ser 
115 120 125 

ctg ctg teg cce egg ccc att tec tac ttg aaa ggc tec teg ggg ggt 2415 
Leu Leu Ser Pro Arg Pro He Ser Tyr Leu Lys Gly Ser Ser Gly Gly 
130 135 140 

ccg ctg ttg tgc ccc geg ggg cac gee gtg ggc ata ttt agg gcc gcg 2463 
Pro Leu Leu Cys Pro Ala Gly His Ala Val Gly He Phe Arg Ala Ala 
145 150 155 

gtg tgc acc cgt gga gtg get aag gcg gtg gac ttt ate cct gtg gag 2511 
Val Cys Thr Arg Gly Val Ala Lys Ala Val Asp Phe He Pro Val Glu 
160 165 170 

aac eta gag aca acc atg agg tec ccg gtg ttc acg gat aac tec tet 2559 
Asn Leu Glu Thr Thr Met Arg Ser Pro Val Phe Thr Asp Asn Ser Ser 
175 180 185 190 

oca eca gta gtg ccc cag age ttc cag gtg get cac etc cat get ccc 2607 
Pro Pro Val Val Pro Gin Ser Phe Gin Val Ala His Leu His Ala Pro 
195 200 205 

aca ggc age ggc aaa age acc aag gtc ccg get gca tat gca get cag 2655 
Thr Gly Ser Gly Lys Ser Thr Lys Val Pro Ala Ala Tyr Ala Ala Gin 
210 215 220 

ggc tat aag gtg eta gta etc aac cce tet gtt get gca aca ctg ggc 2703 
Gly Tyr Lys Val Leu Val Leu Asn Pro Ser Val Ala Ala Thr Leu Gly 
225 230 235 

ttt ggt get tac atg tec aag get cat ggg ate gat ect aac ate agg 2751 
Phe Gly Ala Tyr Met Ser Lys Ala His Gly He Asp Pro Asn He Arg 
240 245 250 

acc ggg gtg aga aca att acc act ggc age ccc ate acg tac tee acc 2799 
Thr Gly Val Arg Thr He Thr Thr Gly Ser Pro He Thr Tyr Ser Thr 
255 260 265 270 

tac ggc aag ttc ctt gcc gac ggc ggg tgc teg ggg ggc get tat gac 2847 
Tyr Gly Lys Phe Leu Ala Asp Gly Gly Cys Ser Gly Gly Ala Tyr Asp 
275 280 285 

ata ata att tgt gac gag tgc cae tec acg gat gcc aca tee ate ttg 2 895 
He He He Cys Asp Glu Cys His Ser Thr Asp Ala Thr Ser He Leu 
290 295 300 
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ggc att ggc act gtc ctt gac caa gca gag act gcg ggg gcg aga ctg 2 943 
Gly lie Gly Thr Val Leu Asp Gin Ala Glu Thr Ala Gly Ala Arg Leu 
305 310 315 

gtt gtg etc gcc acc gcc acc cct ccg ggc tec gtc act gtg ccc cat 2991 
Val Val Leu Ala Thr Ala Thr Pro Pro Gly Ser Val Thr Val Pro His 
320 325 330 

ccc aac ate gag gag gtt get ctg tec acc acc gga gag ate ect ttt 3039 
Pro Asn lie Glu Glu Val Ala Leu Ser Thr Thr Gly Glu He Pro Phe 
335 340 345 350 

tac ggc aag get ate eec etc gaa gta ate aag ggg ggg aga cat etc 3087 
Tyr Gly Lys Ala He Pro Leu Glu Val He Lys Gly Gly Arg His Leu 
355 360 365 

ate ttc tgt cat tea aag aag aag tgc gac gaa etc gee gca aag ctg 3135 
He Phe Cys His Ser Lys Lys Lys Cys Asp Glu Leu Ala Ala Lys Leu 
370 375 380 

gtc gca ttg ggc ate aat gee gtg gcc tac tae ege ggt ett gac gtg 3183 
Val Ala Leu Gly He Asn Ala Val Ala Tyr Tyr Arg Gly Leu Asp Val 
385 390 395 

tec gtc ate ccg acc age ggc gat gtt gtc gtc gtg gca acc gat gcc 3231 
Ser Val He Pro Thr Ser Gly Asp Val Val Val Val Ala Thr Asp Ala 
400 405 410 

etc atg ace ggc tat ace ggc gac ttc gac teg gtg ata gac tgc aat 3279 
Leu Met Thr Gly Tyr Thr Gly Asp Phe Asp Ser Val He Asp Cys Asn 
415 420 425 430 

aeg tgt gtc aec cag aca gtc gat ttc age ett gac ect acc ttc ace 3327 
Thr Cys Val Thr Gin Thr Val Asp Phe Ser Leu Asp Pro Thr Phe Thr 
435 440 445 

att gag aca ate acg etc ccc caa gat get gtc tec ege act caa cgt 3375 
He Glu Thr He Thr Leu Pro Gin Asp Ala Val Ser Arg Thr Gin Arg 
450 455 460 

egg ggc agg act ggc agg ggg aag eea ggc ate tac aga ttt gtg gca 3423 
Arg Gly Arg Thr Gly Arg Gly Lys Pro Gly He Tyr Arg Phe Val Ala 
465 470 475 

eeg ggg gag cgc ccc tee ggc atg ttc gac teg tee gtc etc tgt gag 3471 
Pro Gly Glu Arg Pro Ser Gly Met Phe Asp Ser Ser Val Leu Cys Glu 
480 485 490 

tgc tat gac gca ggc tgt get tgg tat gag etc acg eec gcc gag act 3519 
Cys Tyr Asp Ala Gly Cys Ala Trp Tyr Glu Leu Thr Pro Ala Glu Thr 
495 500 505 510 

aca gtt agg eta cga gcg tac atg aac acc eeg ggg ett ccc gtg tgc 3567 
Thr Val Arg Leu Arg Ala Tyr Met Asn Thr Pro Gly Leu Pro Val Cys 
515 520 525 

cag gac cat ett gaa ttt tgg gag ggc gtc ttt aca ggc etc act cat 3615 
Gin Asp His Leu Glu Phe Trp Glu Gly Val Phe Thr Gly Leu Thr His 
530 535 540 
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ata gat gcc cac ttt eta tec cag aca aag eag agt ggg gag aac ett 3663 
lie Asp Ala His Phe Leu Ser Gin Thr Lys Gin Ser Gly Glu Asn Leu 
545 550 555 

cct tae ctg gta geg tac eaa gee ace gtg tge get agg get caa gee 3711 
Pro Tyr Leu Val Ala Tyr Gin Ala Thr Val Cys Ala Arg Ala Gin Ala 
560 565 570 

eet eee eea teg tgg gac cag atg tgg aag tgt ttg att cge etc aag 3759 
Pro Pro Pro Ser Trp Asp Gin Met Trp Lys Cys Leu lie Arg Leu Lys 
575 580 585 590 

eee ace ete eat ggg eea aea cec etg eta tae aga etg gge get gtt 3807 
Pro Thr Leu His Gly Pro Thr Pro Leu Leu Tyr Arg Leu Gly Ala Val 
595 600 605 

eag aat gaa ate aee etg aeg eae eea gtc aee aaa tae ate atg aea 3855 
Gin Asn Glu lie Thr Leu Thr His Pro Val Thr Lys Tyr lie Met Thr 
610 615 620 

tge atg teg gee gac ctg gag gtc gtc aeg age aee tgg gtg etc gtt 3903 
Cys Met Ser Ala Asp Leu Glu Val Val Thr Ser Thr Trp Val Leu Val 
625 630 635 

gge gge gte etg get get ttg gee geg tat tge etg tea aca gge tge 3 951 
Gly Gly Val Leu Ala Ala Leu Ala Ala Tyr Cys Leu Ser Thr Gly Cys 
640 645 650 

gtg gte ata gtg gge agg gtc gte ttg tec ggg aag ccg gea ate ata 3999 
Val Val lie Val Gly Arg Val Val Leu Ser Gly Lys Pro Ala lie lie 
655 660 665 670 

eet gac agg gaa gte etc tae ega gag tte gat gag atg gaa gag tge 4047 
Pro Asp Arg Glu Val Leu Tyr Arg Glu Phe Asp Glu Met Glu Glu Cys 





675 




680 




685 




taggatccac 


taegegttag 


agctcgetga 


tcagectega 


ctgtgeette 


tagttgccag 4107 


ccatctgttg 


tttgeeeetc 


cccegtgeet 


tccttgacee 


tggaaggtge 


cacteccact 


4167 


gtccttteet 


aataaaatga 


ggaaattgca 


tcgcattgtc 


tgagtaggtg 


tcattctatt 


4227 


etggggggtg 


gggtggggea 


ggaeagcaag 


ggggaggatt 


gggaagacaa 


tagcaggcat 


4287 


getggggage 


tcttccgctt 


eetegetcac 


tgaetegctg 


cgctcggtcg 


ttcggctgeg 


4347 


gcgageggta 


tcageteaet 


eaaaggeggt 


aataeggtta 


teeacagaat 


caggggataa 


4407 


cgcaggaaag 


aaeatgtgag 


caaaaggcea 


gcaaaaggce 


aggaaecgta 


aaaaggccgc 


4467 


gttgctggcg 


ttttteeata 


ggctecgeee 


ccetgaegag 


catcaeaaaa 


atcgaegetc 


4527 


aagtcagagg 


tggcgaaacc 


cgaeaggact 


ataaagatac 


eaggegttte 


cccctggaag 


4587 


ctccctegtg 


cgctctectg 


ttccgaccct 


gccgcttacc 


ggataectgt 


ccgcctttct 


4647 


cccttcggga 


agcgtggcgc 


tttcteaatg 


eteaegctgt 


aggtatctca 


gtteggtgta 


4707 


ggtcgttege 


teeaagctgg 


gctgtgtgca 


cgaaecccec 


gtteagceeg 


accgetgegc 


4767 
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cttatccggt aactatcgtc ttgagtccaa cccggtaaga cacgacttat cgccactggc 4827 
agcagccact ggtaacagga ttagcagagc gaggtatgta ggcggtgcta cagagttctt 4887 
gaagtggtgg cctaactacg gctacactag aaggacagta tttggtatct gcgctctgct 4947 
gaagccagtt accttcggaa aaagagttgg tagctcttga tccggcaaac aaaccaccgc 5007 
tggtagcggt ggtttttttg tttgcaagca gcagattacg cgcagaaaaa aaggatctca 5067 
agaagatcct ttgatctttt ctacggggtc tgacgctcag tggaacgaaa actcacgtta 5127 
agggattttg gtcatgagat tatcaaaaag gatcttcacc tagatccttt taaattaaaa 5187 
atgaagtttt aaatcaatct aaagtatata tgagtaaact tggtctgaca gttaccaatg 5247 
cttaatcagt gaggcaccta tctcagcgat ctgtctattt cgttcatcca tagttgcctg 5307 
actccccgtc gtgtagataa ctacgatacg ggagggctta ccatctggcc ccagtgctgc 5367 
aatgataccg cgagacccac gctcaccggc tccagattta tcagcaataa accagccagc 5427 
cggaagggcc gagcgcagaa gtggtcctgc aactttatcc gcctccatcc agtctattaa 54 87 
ttgttgccgg gaagctagag taagtagttc gccagttaat agtttgcgca acgttgttgc 5547 
cattgctaca ggcatcgtgg tgtcacgctc gtcgtttggt atggcttcat tcagctccgg 5607 
ttcccaacga tcaaggcgag ttacatgatc ccccatgttg tgcaaaaaag cggttagctc 5667 
cttcggtcct ccgatcgttg tcagaagtaa gttggccgca gtgttatcac tcatggttat 5727 
ggcagcactg cataattctc ttactgtcat gccatccgta agatgctttt ctgtgactgg 5787 
tgagtactca accaagtcat tctgagaata gtgtatgcgg cgaccgagtt gctcttgccc 5847 
ggcgtcaata cgggataata ccgcgccaca tagcagaact ttaaaagtgc tcatcattgg 5907 
aaaacgttct tcggggcgaa aactctcaag gatcttaccg ctgttgagat ccagttcgat 5967 
gtaacccact cgtgcaccca actgatcttc agcatctttt actttcacca gcgtttctgg 6027 
gtgagcaaaa acaggaaggc aaaatgccgc aaaaaaggga ataagggcga cacggaaatg 6087 
ttgaatactc atactcttcc tttttcaata ttattgaagc atttatcagg gttattgtct 6147 
catgagcgga tacatatttg aatgtattta gaaaaataaa caaatagggg ttccgcgcac 6207 
atttccccga aaagtgccac ctgacgtcta agaaaccatt attatcatga cattaaccta 6267 
taaaaatagg cgtatcacga ggccctttcg tc 62 99 

<210> 7 
<211> 686 
<212> PRT 

<213> Artificial Sequence 
<220> 
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<223> Description of Artificial Sequence: pNS34a 
<400> 7 

Met Ala Pro lie Thr Ala Tyr Ala Gin Gin Thr Arg Gly Leu Leu Gly 
15 10 15 

Cys lie lie Thr Ser Leu Thr Gly Arg Asp Lys Asn Gin Val Glu Gly 
20 25 30 

Glu Val Gin lie Val Ser Thr Ala Ala Gin Thr Phe Leu Ala Thr Cys 
35 40 45 

lie Asn Gly Val Cys "^ip Thr Val Tyr His Gly Ala Gly Thr Arg Thr 
50 55 60 

lie Ala Ser Pro Lys Gly Pro Val lie Gin Met Tyr Thr Asn Val Asp 
65 70 75 80 

Gin Asp Leu Val Gly Trp Pro Ala Ser Gin Gly Thr Arg Ser Leu Thr 
85 90 95 

Pro Cys Thr Cys Gly Ser Ser Asp Leu Tyr Leu Val Thr Arg His Ala 
100 105 110 

Asp Val lie Pro Val Arg Arg Arg Gly Asp Ser Arg Gly Ser Leu Leu 
115 120 125 

Ser Pro Arg Pro lie Ser Tyr Leu Lys Gly Ser Ser Gly Gly Pro Leu 
130 135 140 

Leu Cys Pro Ala Gly His Ala Val Gly He Phe Arg Ala Ala Val Cys 
145 150 155 160 

Thr Arg Gly Val Ala Lys Ala Val Asp Phe He Pro Val Glu Asn Leu 
165 170 175 

Glu Thr Thr Met Arg Ser Pro Val Phe Thr Asp Asn Ser Ser Pro Pro 
180 185 X90 

Val Val Pro Gin Ser Phe Gin Val Ala His Leu His Ala Pro Thr Gly 
195 200 205 

Ser Gly Lys Ser Thr Lys Val Pro Ala Ala Tyr Ala Ala Gin Gly Tyr 
210 215 220 

Lys Val Leu Val Leu Asn Pro Ser Val Ala Ala Thr Leu Gly Phe Gly 
225 230 235 240 

Ala Tyr Met Ser Lys Ala His Gly He Asp Pro Asn He Arg Thr Gly 
245 250 255 

Val Arg Thr He Thr Thr Gly Ser Pro He Thr Tyr Ser Thr Tyr Gly 
260 265 270 

Lys Phe Leu Ala Asp Gly Gly Cys Ser Gly Gly Ala Tyr Asp He He 
275 280 285 

He Cys Asp Glu Cys His Ser Thr Asp Ala Thr Ser He Leu Gly He 
290 295 300 
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Gly Thr Val Leu Asp Gin Ala Glu Thr Ala Gly Ala Arg Leu Val Val 
305 310 315 320 

Leu Ala Thr Ala Thr Pro Pro Gly Ser Val Thr Val Pro His Pro Asn 
325 330 335 

He Glu Glu Val Ala Leu Ser Thr Thr Gly Glu He Pro Phe Tyr Gly 
340 345 350 

Lys Ala He Pro Leu Glu Val He Lys Gly Gly Arg His Leu He Phe 
355 360 365 

Cys His Ser Lys Lys Lys Cys Asp Glu Leu Ala Ala Lys Leu Val Ala 
370 375 380 

Leu Gly He Asn Ala Val Ala Tyr Tyr Arg Gly Leu Asp Val Ser Val 
385 390 395 400 

He Pro Thr Ser Gly Asp Val Val Val Val Ala Thr Asp Ala Leu Met 
405 410 415 

Thr Gly Tyr Thr Gly Asp Phe Asp Ser Val He Asp Cys Asn Thr Cys 
420 425 430 

Val Thr Gin Thr Val Asp Phe Ser Leu Asp Pro Thr Phe Thr He Glu 
435 440 445 

Thr He Thr Leu Pro Gin Asp Ala Val Ser Arg Thr Gin Arg Arg Gly 
450 455 460 

Arg Thr Gly Arg Gly Lys Pro Gly He Tyr Arg Phe Val Ala Pro Gly 
465 470 475 480 

Glu Arg Pro Ser Gly Met Phe Asp Ser Ser Val Leu Cys Glu Cys Tyr 
485 490 495 

Asp Ala Gly Cys Ala Trp Tyr Glu Leu Thr Pro Ala Glu Thr Thr Val 
500 505 510 

Arg Leu Arg Ala Tyr Met Asn Thr Pro Gly Leu Pro Val Cys Gin Asp 
515 520 525 

His Leu Glu Phe Trp Glu Gly Val Phe Thr Gly Leu Thr His He Asp 
530 535 540 

Ala His Phe Leu Ser Gin Thr Lys Gin Ser Gly Glu Asn Leu Pro Tyr 
545 550 555 560 

Leu Val Ala Tyr Gin Ala Thr Val Cys Ala Arg Ala Gin Ala Pro Pro 
565 570 575 

Pro Ser Trp Asp Gin Met Trp Lys Cys Leu He Arg Leu Lys Pro Thr 
580 585 590 

Leu His Gly Pro Thr Pro Leu Leu Tyr Arg Leu Gly Ala Val Gin Asn 
595 600 605 



Glu He Thr Leu Thr His Pro Val Thr Lys Tyr He Met Thr Cys Met 
610 615 620 
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Ser Ala Asp Leu Glu Val Val Thr Ser Thr Trp Val Leu Val Gly Gly 
625 630 635 640 

Val Leu Ala Ala Leu Ala Ala Tyr Cys Leu Ser Thr Gly Cys Val Val 
645 650 655 

He Val Gly Arg Val Val Leu Ser Gly Lys Pro Ala lie He Pro Asp 
660 665 670 

Arg Glu Val Leu Tyr Arg Glu Phe Asp Glu Met Glu Glu Cys 
675 680 685 

<210> 8 
<211> 19912 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: pd.deltaNS3NS5 

<220> 
<221> CDS 

<222> (12745) . . (18057) 
<400> 8 



atcgatccta 


ccccttgcgc 


taaagaagta 


tatgtgccta 


ctaacgcttg 


tctttgtctc 


60 


tgtcactaaa 


cactggatta 


ttactcccag 


atacttattt 


tggactaatt 


taaatgattt 


120 


cggatcaacg 


ttcttaatat 


cgctgaatct 


tccacaattg 


atgaaagtag 


ctaggaagag 


180 


gaattggtat 


aaagtttttg 


tttttgtaaa 


tctcgaagta 


tactcaaacg 


aatttagtat 


240 


tttctcagtg 


atctcccaga 


tgctttcacc 


ctcacttaga 


agtgctttaa 


gcattttttt 


300 


actgtggcta 


tttcccttat 


ctgcttcttc 


cgatgattcg 


aactgtaatt 


gcaaactact 


360 


tacaatatca 


gtgatatcag 


attgatgttt 


ttgtccatag 


taaggaataa 


ttgtaaattc 


420 


ccaagcagga 


atcaatttct 


ttaatgaggc 


ttccagaatt 


gttgcttttt 


gcgtcttgta 


480 


tttaaactgg 


agtgatttat 


tgacaatatc 


gaaactcagc 


gaattgctta 


tgatagtatt 


540 


atagctcatg 


aatgtggctc 


tcttgattgc 


tgttccgtta 


tgtgtaatca 


tccaacataa 


600 


ataggttagt 


tcagcagcac 


ataatgctat 


tttctcacct 


gaaggtcttt 


caaacctttc 


660 


cacaaactga 


cgaacaagca 


ccttaggtgg 


tgttttacat 


aatatatcaa 


attgtggcat 


720 


gcttagcgcc 


gatcttgtgt 


gcaattgata 


tctagtttca 


actactctat 


ttatcttgta 


780 


tcttgcagta 


ttcaaacacg 


ctaactcgaa 


aaactaactt 


taattgtcct 


gtttgtctcg 


840 


cgttctttcg 


aaaaatgcac 


cggccgcgca 


ttatttgtac 


tgcgaaaata 


attggtactg 


900 


cggtatcttc 


atttcatatt 


ttaaaaatgc 


acctttgctg 


cttttcctta 


atttttagac 


960 


ggcccgcagg 


ttcgttttgc 


ggtactatct 


tgtgataaaa 


agttgttttg 


acatgtgatc 


1020 
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tgcacagatt ttataatgta ataagcaaga 
agaaaaccaa aatggacgac attgaaacag 
cttatagcgt ctgggatgta tgtcggctgt 
ttgatataga gagtaaacgt aagtctgatg 
ccatggaatc tctcacaacc ggtaggccgt 
gcgtatcttc tgactccagt gctgaggtaa 
ggtttgattc gattggaaat ggtatgctct 
atttgatgct acagaataac aagctgttag 
ctataataat aggaagattg cccgagaaag 
gaaaaatgga ttgtacacag ttattagtcc 
agctcgtaag cgtcgttacc caattgctta 
taataggtga tttattcatc ccggaatctc 
tggcggcaga gaatcgttta cagcaaaaaa 
accatgctaa tacaaatgaa gaagttccct 
caagaggagc atataaatta caaaacacca 
aaaaaaggag agtagcaacg agggtaaggg 
gatccaatat caaaggaaat gatagcattg 
cagcatatag aacagctaaa gggtagtgct 
gggataatat cacaggaggt actagactac 
gtacgcattt aagcataaac acgcactatg 
caacacgcag atataggtgc gacgtgaaca 
ttttcggaag cgctcgtttt cggaaacgct 
ctagaaagta taggaacttc agagcgcttt 
ttcaaaaaac caaaaacgca ccggactgta 
tccacaaaca ttgctcaaaa gtatctcttt 
aacctaccca tccacctttc gctccttgaa 
aggcttccaa tgctcttcaa attttactgt 
ctcttcataa tgtaagctta tctttatcga 
ctttacggtt ccctgagatt gaattagttc 
ctttgtacga cgaattttga ggttcgccat 



atacattatc aaacgaacaa tactggtaaa 1080 
ccaagaatct gacggtaaaa gcacgtacag 1140 
ttattgaaat gattgctcct gatgtagata 1200 
agctactctt tccaggatat gtcataaggc 1260 
atggtcttga ttctagcgca gaagattcca 1320 
ttttgcctgc tgcgaagatg gttaaggaaa 1380 
cttcacaaga agcaagtcag gctgccatag 1440 
acaatagaaa gcaactatac aaatctattg 1500 
acaagaagag agctaccgaa atgctcatga 1560 
caccagctcc aacggaagaa gatgttatga 1620 
ctttagttcc accagatcgt caagctgctt 1680 
taaaggatat attcaatagt ttcaatgaac 1740 
agagtgagtt ggaaggaagg actgaagtga 1800 
ccaggcgaac aagaagtaga gacacaaatg 1860 
tcactgaggg ccctaaagcg gttcccacga 1920 
gcagaaaatc acgtaatact tctagggtat 1980 
aaggatgaga ctaatccaat tgaggagtgg 2040 
gaaggaagca tacgataccc cgcatggaat 2100 
ctttcatcct acataaatag acgcatataa 2160 
ccgttcttct catgtatata tatatacagg 2220 
gtgagctgta tgtgcgcagc tcgcgttgca 2280 
ttgaagttcc tattccgaag ttcctattct 234 0 
tgaaaaccaa aagcgctctg aagacgcact 2400 
acgagctact aaaatattgc gaataccgct 2460 
gctatatatc tctgtgctat atccctatat 2520 
cttgcatcta aactcgacct ctacatcaac 2580 
caagtagacc catacggctg taatatgctg 2640 
atcgtgtgaa aaactactac cgcgataaac 2700 
ctttagtata tgatacaaga cacttttgaa 2760 
cctctggcta tttccaatta tcctgtcggc 2820 
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gaaacatgct gcttaaaact ccaagcggta 
tattatctcc gcctcagttt gatcttccgc 
tatttcaccc cacaatcctt catccgcctc 
atgttgtaca ttgtttagtt cacgagaagg 
tatatgacct ttatcctgtt ctctttccac 
gcacctaata acattcttca aggcggagaa 
tgaaaacgtg agaatgaatt tagtattatt 
tcgaagataa gagaagaatg cagtgacctt 
aaaaaatacg cctttaggcc ttctgatacc 
attaatatct aaaccctctc cgatggtggc 
aaactgtgat aattctgggt gatttatgat 
aggatcaggc caatccagtt ctttttcaat 
tccaacaaat gcaaatgcta acgttttgta 
cccccttgtc gtctcgatta cacacctact 
cataatacat tgcttaatac aagcaagcag 
cattacagct gatgtcattg tatatcagcg 
tcgcggtttt tataaacaaa actttcgtta 
ttggaaattc gggaaaaagt agagcaacgc 
ttaacttcga gaagggatta aggctaattt 
ccattgaatg ccttataaaa cagctataga 
tttgtcaaag cttactgatg atgatgtgtc 
tgacattata aagctggcac ttagaattcc 
tctactgtac gatacacttc cgctcaggtc 
ttgttactct attgatccag ctcagcaaag 
tgtagtaaaa ctagctagac cgagaaagag 
gctgccatca ttattatccg atgtgacgct 
tttttttttt tttttttttt ttttttggta 
agcaaggatt ttcttaactt cttcggcgac 
accacctaaa tcaccagttc tgatacctgc 
ggctttacct tcttcaggca agttcaatga 



ggagaccgat aaaggttaat aggacagccg 2880 
ttcagactgc catttttcac ataatgaatc 2940 
cgcatcttgt tccgttaaac tattgacttc 3000 
gtcctcttca ggcggtagct cctgatctcc 3060 
aaacttagaa atgtattcat gaattatgga 3120 
gtttgggcca gatgcccaat atgcttgaca 3180 
gtgatattct gaggcaattt tattataatc 3240 
tgtattgaca aatggagatt ccatgtatct 3300 
ctttcccctg cggtttagcg tgccttttac 3360 
ctttaactga ctaataaatg caaccgatat 3420 
tcgatcgaca attgtattgt acactagtgc 3480 
taccggtgtg tcgtctgtat tcagtacatg 3540 
tttcttataa ttgtcaggaa ctggaaaagt 3600 
ttcatcgtac accataggtt ggaagtgctg 3660 
tctctcgcca ttcatatttc agttattttc 3720 
ctgtaaaaat ctatctgtta cagaaggttt 3780 
cgaaatcgag caatcacccc agctgcgtat 3840 
gagttgcatt ttttacacca taatgcatga 3 900 
cactagtatg tttcaaaaac ctcaatctgt 3 960 
ttgcatagaa gagttagcta ctcaatgctt 4020 
tactttcagg cgggtctgta gtaaggagaa 4080 
acggactata gactatacta gtatactccg 4140 
cttgtccttt aacgaggcct taccactctt 42 00 
gcagtgtgat ctaagattct atcttcgcga 4260 
actagaaatg caaaaggcac ttctacaatg 4320 
gcattttttt tttttttttt tttttttttt 4380 
caaatatcat aaaaaaagag aatcttttta 4440 
agcatcaccg acttcggtgg tactgttgga 4500 
atccaaaacc tttttaactg catcttcaat 4560 
caatttcaac atcattgcag cagacaagat 4620 
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agtggcgata gggttgacct tattctttgg 
gtacaaacca aatgcggtgt tcttgtctgg 
acccaaggag cctgggataa cggaggcttc 
ggtgattata ataccattta ggtgggttgg 
aatcaattga tgttgaactt tcaatgtagg 
ttttctccat aatcttgaag aggccaaaac 
tggtggctca tgttgtaggg ccatgaaagc 
aacggtgtat tgttcactat cccaagcgac 
aaagtaaata cctcccacta attctctaac 
tggcttgatt ggagataagt ctaaaagaga 
ggcgtacaat tgaagttctt tacggatttt 
ggtaccccat ttaggaccac ccacagcacc 
ttccagcgcc tcatctggaa gtggaacacc 
atgattttcg aaatcgaact tgacattgga 
aatggcttcg gctgtgattt cttgaccaac 
aggggcagac attacaatgg tatatccttg 
aaaaaaaaaa atgcagcttc tcaatgatat 
tatccgacaa actgttttac agatttacga 
acatccgaac ctgggagttt tccctgaaac 
tatagtctag cgctttacgg aagacaatgt 
atctattgca taggtaatct tgcacgtcgc 
tgcacttcaa tagcatatct ttgttaacga 
atgcaacgcg agagcgctaa tttttcaaac 
gaaatgcaac gcgaaagcgc tattttacca 
caaaaatgca acgcgagagc gctaattttt 
gaacagaaat gcaacgcgag agcgctattt 
ttctacaaaa atgcatcccg agagcgctat 
tttctccttt gtgcgctcta taatgcagtc 
taaggttaga agaaggctac tttggtgtct 
cacttcccgc gtttactgat tactagcgaa 



caaatctgga gcggaaccat ggcatggttc 4680 
caaagaggcc aaggacgcag atggcaacaa 4740 
atcggagatg atatcaccaa acatgttgct 4 800 
gttcttaact aggatcatgg cggcagaatc 4 860 
gaattcgttc ttgatggttt cctccacagt 4 920 
attagcttta tccaaggacc aaataggcaa 4980 
ggccattctt gtgattcttt gcacttctgg 5040 
accatcacca tcgtcttcct ttctcttacc 5100 
aacaacgaag tcagtacctt tagcaaattg 5160 
gtcggatgca aagttacatg gtcttaagtt 5220 
tagtaaacct tgttcaggtc taacactacc 5280 
taacaaaacg gcatcagcct tcttggaggc 5340 
tgtagcatcg atagcagcac caccaattaa 5400 
acgaacatca gaaatagctt taagaacctt 5460 
gtggtcacct ggcaaaacga cgatcttctt 5520 
aaatatatat aaaaaaaaaa aaaaaaaaaa 5580 
tcgaatacgc tttgaggaga tacagcctaa 564 0 
tcgtacttgt tacccatcat tgaattttga 5700 
agatagtata tttgaacctg tataataata 5760 
atgtatttcg gttcctggag aaactattgc 5820 
atccccggtt cattttctgc gtttccatct 5880 
agcatctgtg cttcattttg tagaacaaaa 5940 
aaagaatctg agctgcattt ttacagaaca 6000 
acgaagaatc tgtgcttcat ttttgtaaaa 6060 
caaacaaaga atctgagctg catttttaca 6120 
taccaacaaa gaatctatac ttcttttttg 6180 
ttttctaaca aagcatctta gattactttt 6240 
tcttgataac tttttgcact gtaggtccgt 6300 
attttctctt ccataaaaaa agcctgactc 6360 
gctgcgggtg cattttttca agataaaggc 6420 
46 



wo 01/38360 



PCT/USOO/32326 



atccccgatt 


atattctata 


ccgatgtgga 


ttgcgcatac 


tttgtgaaca gaaagtgata 


6480 


gcgttgatga 


ttcttcattg 


gtcagaaaat 


tatgaacggt 


ttcttctatt 


ttgtctctat 


6540 


atactacgta 


taggaaatgt 


ttacattttc 


gtattgtttt 


cgattcactc 


tatgaatagt 


6600 


tcttactaca 


atttttttgt 


ctaaagagta atactagaga 


taaacataaa 


aaatgtagag 


6660 


gtcgagttta 


gatgcaagtt 


caaggagcga 


aaggtggatg 


ggtaggttat 


atagggatat 


6720 


agcacagaga 


tatatagcaa 


agagatactt 


ttgagcaatg 


tttgtggaag 


cggtattcgc 


6780 


aatattttag 


tagctcgtta 


cagtccggtg 


cgtttttggt 


tttttgaaag 


tgcgtcttca 


6840 


gagcgctttt 


ggttttcaaa 


agcgctctga 


agttcctata 


ctttctagag 


aataggaact 


6900 


tcggaatagg 


aacttcaaag 


cgtttccgaa 


aacgagcgct 


tccgaaaatg 


caacgcgagc 


6960 


tgcgcacata 


cagctcactg 


ttcacgtcgc 


acctatatct 


gcgtgttgcc 


tgtatatata 


7020 


tatacatgag 


aagaacggca 


tagtgcgtgt 


ttatgcttaa 


atgcgtactt 


atatgcgtct 


7030 


atttatgtag 


gatgaaaggt 


agtctagtac 


ctcctgtgat 


attatcccat 


tccatgcggg 


7140 


gtatcgtatg 


cttccttcag 


cactaccctt 


tagctgttct 


atatgctgcc 


actcctcaat 


7200 


tggattagtc 


tcatccttca 


atgctatcat 


ttcctttgat 


attggatcat 


atgcatagta 


7260 


ccgagaaact 


agtgcgaagt 


agtgatcagg 


tattgctgtt 


atctgatgag 


tatacgttgt 


7320 


cctggccacg 


gcagaagcac 


gcttatcgct 


ccaatttccc 


acaacattag 


tcaactccgt 


7380 


taggcccttc 


attgaaagaa 


atgaggtcat 


caaatgtctt 


ccaatgtgag 


attttgggcc 


7440 


attttttata 


gcaaagattg 


aataaggcgc 


atttttcttc 


aaagctttat 


tgtacgatct 


7500 


gactaagtta 


tcttttaata 


attggtattc 


ctgtttattg 


cttgaagaat 


tgccggtcct 


7560 


atttactcgt 


tttaggactg 


gttcagaatt 


cctcaaaaat 


tcatccaaat 


atacaagtgg 


7620 


atcgatgata 


agctgtcaaa 


catgagaatt 


cttgaagacg 


aaagggcctc 


gtgatacgcc 


7680 


tatttttata 


ggttaatgtc 


atgataataa 


tggtttctta 


gacgtcaggt 


ggcacttttc 


7740 


ggggaaatgt 


gcgcggaacc 


cctatttgtt 


tatttttcta 


aatacattca 


aatatgtatc 


7800 


cgctcatgag 


acaataaccc 


tgataaatgc 


ttcaataata 


ttgaaaaagg 


aagagtatga 


7860 


gtattcaaca 


tttccgtgtc 


gcccttattc 


ccttttttgc 


ggcattttgc 


cttcctgttt 


7920 


ttgctcaccc 


agaaacgctg 


gtgaaagtaa aagatgctga 


agatcagttg ggtgcacgag 


7980 


tgggttacat 


cgaactggat 


ctcaacagcg gtaagatcct 


tgagagtttt 


cgccccgaag 


8040 


aacgttttcc 


aatgatgagc 


acttttaaag 


ttctgctatg 


tggcgcggta 


ttatcccgtg 


8100 


ttgacgccgg 


gcaagagcaa 


ctcggtcgcc 


gcatacacta 


ttctcagaat 


gacttggttg 


8160 


agtactcacc 


agtcacagaa 


aagcatctta 


cggatggcat 


gacagtaaga 


gaattatgca 


6220 
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gtgctgccat aaccatgagt gataacactg cggccaactt acttctgaca acgatcggag 8280 
gaccgaagga gctaaccgct tttttgcaca acatggggga tcatgtaact cgccttgatc 8340 
9ttgggaacc ggagctgaat gaagccatac caaacgacga gcgtgacacc acgatgcctg 8400 
cagcaatggc aacaacgttg cgcaaactat taactggcga actacttact ctagcttccc 8460 
ggcaacaatt aatagactgg atggaggcgg ataaagttgc aggaccactt ctgcgctcgg 8520 
cccttccggc tggctggttt attgctgata aatctggagc cggtgagcgt gggtctcgcg 8580 
gtatcattgc agcactgggg ccagatggta agccctcccg tatcgtagtt atctacacga 8640 
cggggagtca ggcaactatg gatgaacgaa atagacagat cgctgagata ggtgcctcac 8700 
tgattaagca ttggtaactg tcagaccaag tttactcata tatactttag attgatttaa 8760 
aacttcattt ttaatttaaa aggatctagg tgaagatcct ttttgataat ctcatgacca 8820 
aaatccctta acgtgagttt tcgttccact gagcgtcaga ccccgtagaa aagatcaaag 8880 
gatcttcttg agatcctttt tttctgcgcg taatctgctg cttgcaaaca aaaaaaccac 8940 
cgctaccagc ggtggtttgt ttgccggatc aagagctacc aactcttttt ccgaaggtaa 9000 
ctggcttcag cagagcgcag ataccaaata ctgtccttct agtgtagccg tagttaggcc 9060 
accacttcaa gaactctgta gcaccgccta catacctcgc tctgctaatc ctgttaccag 912 0 
tggctgctgc cagtggcgat aagtcgtgtc ttaccgggtt ggactcaaga cgatagttac 918 0 
cggataaggc gcagcggtcg ggctgaacgg ggggttcgtg cacacagccc agcttggagc 9240 
gaacgaccta caccgaactg agatacctac agcgtgagct atgagaaagc gccacgcttc 9300 
ccgaagggag aaaggcggac aggtatccgg taagcggcag ggtcggaaca ggagagcgca 9360 
cgagggagct tccaggggga aacgcctggt atctttatag tcctgtcggg tttcgccacc 9420 
tctgacttga gcgtcgattt ttgtgatgct cgtcaggggg gcggagccta tggaaaaacg 9480 
ccagcaacgc ggccttttta cggttcctgg ccttttgctg gccttttgct cacatgttct 954 0 
ttcctgcgtt atcccctgat tctgtggata accgtattac cgcctttgag tgagctgata 9600 
ccgctcgccg cagccgaacg accgagcgca gcgagtcagt gagcgaggaa gcggaagagc 9660 
gcctgatgcg gtattttctc cttacgcatc tgtgcggtat ttcacaccgc atatggtgca 9720 
ctctcagtac aatctgctct gatgccgcat agttaagcca gtatacactc cgctatcgct 9780 
acgtgactgg gtcatggctg cgccccgaca cccgccaaca cccgctgacg cgccctgacg 984 0 
ggcttgtctg ctcccggcat ccgcttacag acaagctgtg accgtctccg ggagctgcat 9900 
gtgtcagagg ttttcaccgt catcaccgaa acgcgcgagg cagctgcggt aaagctcatc 9960 
agcgtggtcg tgaagcgatt cacagatgtc tgcctgttca tccgcgtcca gctcgttgag 10020 
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tttctccaga agcgttaatg tctggcttct gataaagcgg gccatgttaa gggcggtttt 10080 
ttcctgtttg gtcactgatg cctccgtgta agggggattt ctgttcatgg gggtaatgat 1014 0 
accgatgaaa cgagagagga tgctcacgat acgggttact gatgatgaac atgcccggtt 10200 
actggaacgt tgtgagggta aacaactggc ggtatggatg cggcgggacc agagaaaaat 10260 
cactcagggt caatgccagc gcttcgttaa tacagatgta ggtgttccac agggtagcca 10320 
gcagcatcct gcgatgcaga tccggaacat aatggtgcag ggcgctgact tccgcgtttc 10380 
cagactttac gaaacacgga aaccgaagac cattcatgtt gttgctcagg tcgcagacgt 10440 
tttgcagcag cagtcgcttc acgttcgctc gcgtatcggt gattcattct gctaaccagt 10500 
aaggcaaccc cgccagccta gccgggtcct caacgacagg agcacgatca tgcgcacccg 10560 
tggccaggac ccaacgctgc ccgagatgcg ccgcgtgcgg ctgctggaga tggcggacgc 10620 
gatggatatg ttctgccaag ggttggtttg cgcattcaca gttctccgca agaattgatt 10680 
ggctccaatt cttggagtgg tgaatccgtt agcgaggtgc cgccggcttc cattcaggtc 10740 
gaggtggccc ggctccatgc accgcgacgc aacgcgggga ggcagacaag gtatagggcg 10800 
gcgcctacaa tccatgccaa cccgttccat gtgctcgccg aggcggcata aatcgccgtg 10 860 
acgatcagcg gtccaatgat cgaagttagg ctggtaagag ccgcgagcga tccttgaagc 10920 
tgtccctgat ggtcgtcatc tacctgcctg gacagcatgg cctgcaacgc gggcatcccg 10980 
atgccgccgg aagcgagaag aatcataatg gggaaggcca tccagcctcg cgtcgcgaac 11040 
gccagcaaga cgtagcccag cgcgtcggcc gccatgccgg cgataatggc ctgcttctcg 11100 
ccgaaacgtt tggtggcggg accagtgacg aaggcttgag cgagggcgtg caagattccg 11160 
aataccgcaa gcgacaggcc gatcatcgtc gcgctccagc gaaagcggtc ctcgccgaaa 11220 
atgacccaga gcgctgccgg cacctgtcct acgagttgca tgataaagaa gacagtcata 11280 
agtgcggcga cgatagtcat gccccgcgcc caccggaagg agctgactgg gttgaaggct 11340 
ctcaagggca tcggtcgagg atccttcaat atgcgcacat acgctgttat gttcaaggtc 11400 
ccttcgttta agaacgaaag cggtcttcct tttgagggat gtttcaagtt gttcaaatct 11460 
atcaaatttg caaatcccca gtctgtatct agagcgttga atcggtgatg cgatttgtta 11520 
attaaattga tggtgtcacc attaccaggt ctagatatac caatggcaaa ctgagcacaa 1158 0 
caataccagt ccggatcaac tggcaccatc tctcccgtag tctcatctaa tttttcttcc 11640 
ggatgaggtt ccagatatac cgcaacacct ttattatggt ttccctgagg gaataataga 11700 
atgtcccatt cgaaatcacc aattctaaac ctgggcgaat tgtatttcgg gtttgttaac 11760 
tcgttccagt caggaatgtt ccacgtgaag ctatcttcca gcaaagtctc cacttcttca 11820 
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tcaaattata 

w SmA W4 Wft \^ z3 ^ 


oaciaa tac t c 


u^ddugcCCu 


caneuacggg 


acttccggga 


aaeacagtae 


11880 


caatachtcc 




^^dy dye tea 


ccgcccgttt 


gaagagacta 


atcaaagaat 


11940 


cgttttctca 


aaaaaat" t'aa 


L.dL.I^UUddLrC 


gauagcucga 


tcaaaggggc 


aaaacgtagg 


12000 






y u tcaaa 


Cuccetgatg 


ccaagaactc 


taaceagtet 


12060 


L.CI L.L« UdClCiClCl 


I. ugcc c L.aug 


acccgcccct 


ccggttaeag 


cctgtgtaac 


tgattaatcc 


12120 


t" fir* r» 1" t" t" r* t" a 


d U U-d^ a L. L> t« 


caaugcccca 


attaagggat 


tttgtcttca 


ttaacggctt 


12180 


L. v# ^ L. L. C a. I. 0.0. 


daacgcuaug 


acgttttgcc 


egeaggcggg 


aaaeeatcea 


ctteaegaga 


12240 


a ^ ^ ^ 1^ 


ctgccggaac 


accgggcatc 


tccaacttat 


aagttggaga 


aataagagaa 


12300 




dgagaacgaa 


aaaaaaaaac 


eettagttca 


taggtccatt 


etcttagcgc 


12360 


O O ^ a ^ ^ Q 

aaCCaCa^ag 


aacaggggca 


caaacaggca 


aaaaaeggge 


acaacetcaa 


tggagtgatg 


12420 


caacct^cc^t 


ggagtaaatg 


atgacacaag 


geaattgacc 


cacgcatgta 


tetatctcat 


12480 


tttcttacac 


ctcctattac 


cttctgctct 


etctgatttg 


gaaaaagctg 


aaaaaaaagg 


12540 


ttgaaaccag 


ttccctgaaa 


ttattcccct 


acttgaetaa 


taagtatata 


aagacggtag 


12600 


gtattgattg 


taattctgta 


aatctatttc 


ttaaacttct 


taaattetae 


ttttatagtt 


12660 


agtctttttt 


ttagttttaa 


aacaccaaga 


aettagtttc 


gaataaacae 


aeataaacaa 


12720 


acaagcttac 


aaaacaaatt 


cacc atg get gca tat gca get cag 
Met Ala Ala Tyr Ala Ala Gin 


ggc tat 
Gly Tyr 


12771 



1 5 

aag gtg eta gta etc aae ccc tct gtt get gea aea etg gge ttt ggt 12819 
Lys Val Leu Val Leu Asn Pro Ser Val Ala Ala Thr Leu Gly Phe Gly 
10 15 20 25 

get tae atg tee aag get cat ggg ate gat ect aae ate agg aec ggg 12867 
Ala Tyr Met Ser Lys Ala His Gly He Asp Pro Asn He Arg Thr Gly 
30 35 40 

gtg aga aca att aee aet ggc age ccc ate acg tae tec aec tae gge 12915 
Val Arg Thr He Thr Thr Gly Ser Pro He Thr Tyr Ser Thr Tyr Gly 
45 50 55 

aag tte ctt gee gae gge ggg tgc teg ggg ggc get tat gac ata ata 12963 
Lys Phe Leu Ala Asp Gly Gly Cys Ser Gly Gly Ala Tyr Asp lie He 
60 65 70 

att tgt gae gag tgc cac tee acg gat gee aea tec ate ttg ggc att 13011 
He Cys Asp Glu Cys His Ser Thr Asp Ala Thr Ser He Leu Gly He 
75 80 85 

ggc act gte ctt gae caa gca gag aet gcg ggg gcg aga etg gtt gtg 13059 
Gly Thr Val Leu Asp Gin Ala Glu Thr Ala Gly Ala Arg Leu Val Val 
90 95 100 105 

etc gee ace gee ace cet ccg ggc tec gte act gtg ccc eat ecc aac 13107 



50 



wo 01/38360 



PCT/USOO/32326 



Leu Ala Thr Ala Thr Pro Pro Gly Ser Val Thr Val Pro His Pro Asn 
110 115 120 

ate gag gag gtt get ctg tec acc acc gga gag ate eet ttt tae ggc 
lie Glu Glu Val Ala Leu Ser Thr Thr Gly Glu lie Pro Phe Tyr Gly 
125 130 135 

aag get ate cec ete gaa gta ate aag ggg ggg aga cat etc ate ttc 
Lys Ala lie Pro Leu Glu Val lie Lys Gly Gly Arg His Leu lie Phe 
140 145 150 

tgt cat tea aag aag aag tgc gac gaa etc gee gca aag ctg gtc gca 
Cys His Ser Lys Lys Lys Cys Asp Glu Leu Ala Ala Lys Leu Val Ala 
155 160 165 

ttg gge ate aat gee gtg gcc tac tac cgc ggt ctt gac gtg tec gtc 
Leu Gly lie Asn Ala Val Ala Tyr Tyr Arg Gly Leu Asp Val Ser Val 
170 175 180 185 

ate ccg aec age ggc gat gtt gtc gtc gtg gea acc gat gcc etc atg 
He Pro Thr Ser Gly Asp Val Val Val Val Ala Thr Asp Ala Leu Met 
190 195 200 

acc ggc tat acc ggc gac ttc gac teg gtg ata gae tgc aat acg tgt 
Thr Gly Tyr Thr Gly Asp Phe Asp Ser Val He Asp Cys Asn Thr Cys 
205 210 215 

gtc ace cag aea gtc gat ttc age ctt gac cct acc ttc acc att gag 
Val Thr Gin Thr Val Asp Phe Ser Leu Asp Pro Thr Phe Thr He Glu 
220 225 230 

aca ate aeg etc ecc caa gat get gtc tec cgc act caa egt egg ggc 
Thr He Thr Leu Pro Gin Asp Ala Val Ser Arg Thr Gin Arg Arg Gly 
235 240 245 

agg act ggc agg ggg aag eca ggc ate tac aga ttt gtg gca ccg ggg 
Arg Thr Gly Arg Gly Lys Pro Gly He Tyr Arg Phe Val Ala Pro Gly 
250 255 260 265 

gag cgc cec tee ggc atg ttc gac teg tec gtc etc tgt gag tgc tat 
Glu Arg Pro Ser Gly Met Phe Asp Ser Ser Val Leu Cys Glu Cys Tyr 
270 275 280 

gac gca ggc tgt get tgg tat gag etc aeg cec gcc gag act aca gtt 
Asp Ala Gly Cys Ala Trp Tyr Glu Leu Thr Pro Ala Glu Thr Thr Val 
285 290 295 

agg eta ega gcg tac atg aac acc eeg ggg ctt cce gtg tgc cag gac 
Arg Leu Arg Ala Tyr Met Asn Thr Pro Gly Leu Pro Val Cys Gin Asp 
300 305 310 

cat ctt gaa ttt tgg gag ggc gtc ttt aea ggc etc act cat ata gat 
His Leu Glu Phe Trp Glu Gly Val Phe Thr Gly Leu Thr His He Asp 
315 320 325 



gee cac ttt eta tec cag aca aag cag agt ggg gag aac ctt cct tac 
Ala His Phe Leu Ser Gin Thr Lys Gin Ser Gly Glu Asn Leu Pro Tyr 
330 335 340 345 



13155 



13203 



13251 



13299 



13347 



13395 



13443 



13491 



13539 



13587 



13635 



13683 



13731 



13779 
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ctg gta gcg tac caa gcc acc gtg tgc get agg get caa gee cct eec 13827 
Leu Val Ala Tyr Gin Ala Thr Val Cys Ala Arg Ala Gin Ala Pro Pro 
350 355 360 

cca teg tgg gae cag atg tgg aag tgt ttg att cgc etc aag ccc acc 13875 
Pro Ser Trp Asp Gin Met Trp Lys Cys Leu lie Arg Leu Lys Pro Thr 
365 370 375 

etc cat ggg cca aca ccc ctg eta tac aga ctg ggc get gtt cag aat 13923 
Leu His Gly Pro Thr Pro Leu Leu Tyr Arg Leu Gly Ala Val Gin Asn 
380 385 390 

gaa ate acc ctg acg cac cca gtc ace aaa tac ate atg aca tgc atg 13971 
Glu lie Thr Leu Thr His Pro Val Thr Lys Tyr He Met Thr Cys Met 
395 400 405 

teg gee gac ctg gag gte gtc acg age ace tgg gtg etc gtt ggc ggc 14019 
Ser Ala Asp Leu Glu Val Val Thr Ser Thr Trp Val Leu Val Gly Gly 
410 415 420 425 

gtc ctg get get ttg gcc gcg tat tgc ctg tea aca ggc tgc gtg gtc 14067 
Val Leu Ala Ala Leu Ala Ala Tyr Cys Leu Ser Thr Gly Cys Val Val 
430 435 440 

ata gtg ggc agg gte gtc ttg tec ggg aag ccg gea ate ata cct gae 14115 
He Val Gly Arg Val Val Leu Ser Gly Lys Pro Ala He He Pro Asp 
445 450 455 

agg gaa gtc etc tac ega gag tte gat gag atg gaa gag tgc tct cag 14163 
Arg Glu Val Leu Tyr Arg Glu Phe Asp Glu Met Glu Glu Cys Ser Gin 
460 465 470 

cac tta ccg tac ate gag caa ggg atg atg etc gee gag cag tte aag 14211 
His Leu Pro Tyr He Glu Gin Gly Met Met Leu Ala Glu Gin Phe Lys 
475 480 485 

cag aag gcc etc ggc etc ctg cag acc gcg tec cgt cag gca gag gtt 14259 
Gin Lys Ala Leu Gly Leu Leu Gin Thr Ala Ser Arg Gin Ala Glu Val 
490 495 500 505 

ate gee cct get gte cag ace aac tgg caa aaa etc gag acc tte tgg 14307 
He Ala Pro Ala Val Gin Thr Asn Trp Gin Lys Leu Glu Thr Phe Trp 
510 515 520 

gcg aag cat atg tgg aac tte ate agt ggg ata caa tac ttg gcg ggc 14355 
Ala Lys His Met Trp Asn Phe He Ser Gly He Gin Tyr Leu Ala Gly 
525 530 535 

ttg tea aeg ctg cct ggt aac ccc gee att get tea ttg atg get ttt 14403 
Leu Ser Thr Leu Pro Gly Asn Pro Ala He Ala Ser Leu Met Ala Phe 
540 545 550 

aca get get gtc ace age cca eta ace act age caa ace etc etc tte 14451 
Thr Ala Ala Val Thr Ser Pro Leu Thr Thr Ser Gin Thr Leu Leu Phe 
555 560 565 

aac ata ttg ggg ggg tgg gtg get gcc cag etc gcc gcc ccc ggt gcc 14499 
Asn He Leu Gly Gly Trp Val Ala Ala Gin Leu Ala Ala Pro Gly Ala 
570 575 580 585 
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get act gcc ttt gtg ggc get ggc tta get ggc gcc gcc ate ggc agt 14547 
Ala Thr Ala Phe Val Gly Ala Gly Leu Ala Gly Ala Ala lie Gly Ser 
590 595 600 

gtt gga ctg ggg aag gtc cte ata gac ate ctt gca ggg tat ggc gcg 14595 
Val Gly Leu Gly Lys Val Leu He Asp He Leu Ala Gly Tyr Gly Ala 
605 610 615 

ggc gtg gcg gga get ctt gtg gca tte aag ate atg age ggt gag gtc 14643 
Gly Val Ala Gly Ala Leu Val Ala Phe Lys He Met Ser Gly Glu Val 
620 625 630 

cce tec acg gag gac ctg gtc aat eta ctg ecc gcc ate etc teg cce 14691 
Pro Ser Thr Glu Asp Leu Val TVen Leu Leu Pro Ala He Leu Ser Pro 
635 640 645 

gga gcc etc gta gtc ggc gtg gtc tgt gca gca ata ctg cgc egg cae 14739 
Gly Ala Leu Val Val Gly Val Val Cys Ala Ala He Leu Arg Arg His 
650 655 660 665 

gtt ggc eeg ggc gag ggg gca gtg cag tgg atg aac egg ctg ata gee 14787 
Val Gly Pro Gly Glu Gly Ala Val Gin Trp Met Asn Arg Leu He Ala 
670 675 680 

ttc gee tec egg ggg aac cat gtt tec ecc acg eac tac gtg ccg gag 14835 
Phe Ala Ser Arg Gly Asn His Val Ser Pro Thr His Tyr Val Pro Glu 
685 690 695 

age gat gca get gcc cgc gtc act gcc ata etc age age etc act gta 14883 
Ser Asp Ala Ala Ala Arg Val Thr Ala He Leu Ser Ser Leu Thr Val 
700 705 710 

ace cag etc ctg agg cga ctg cae cag tgg ata age teg gag tgt ace 14 931 
Thr Gin Leu Leu Arg Arg Leu His Gin Trp He Ser Ser Glu Cys Thr 
715 720 725 

act cea tgc tec ggt tec tgg eta agg gac ate tgg gac tgg ata tge 14979 
Thr Pro Cys Ser Gly Ser Trp Leu Arg Asp He Trp Asp Trp He Cys 
730 735 740 745 

gag gtg ttg age gac ttt aag ace tgg eta aaa get aag etc atg cca 15027 
Glu Val Leu Ser Asp Phe Lys Thr Trp Leu Lys Ala Lys Leu Met Pro 
750 755 760 

cag ctg cct ggg ate ecc ttt gtg tec tgc cag cgc ggg tat aag ggg 15075 
Gin Leu Pro Gly He Pro Phe Val Ser Cys Gin Arg Gly Tyr Lys Gly 
765 770 775 

gtc tgg cga ggg gac ggc ate atg cae act cgc tgc cae tgt gga get 15123 
Val Trp Arg Gly Asp Gly He Met His Thr Arg Cys His Cys Gly Ala 
780 785 790 

gag ate act gga cat gtc aaa aac ggg acg atg agg ate gtc ggt cct 15171 
Glu He Thr Gly His Val Lys Asn Gly Thr Met Arg He Val Gly Pro 
795 800 805 

agg ace tgc agg aac atg tgg agt ggg ace ttc cce att aat gcc tac 15219 
Arg Thr Cys Arg Asn Met Trp Ser Gly Thr Phe Pro He Asn Ala Tyr 
810 815 820 825 
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acc acg ggc ccc tgt acc ccc ctt cct gcg ccg aac tac acg ttc gcg 
Thr Thr Gly Pro Cys Thr Pro Leu Pro Ala Pro Asn Tyr Thr Phe Ala 
830 835 840 



15267 



eta tgg agg gtg tct gca gag gaa tac gtg gag ata agg cag gtg ggg 
Leu Trp Arg Val Ser Ala Glu Glu Tyr Val Glu He Arg Gin Val Gly 
845 850 855 



15315 



gac ttc cac tac gtg acg ggt atg act act gac aat ctt aaa tgc ccg 
Asp Phe His Tyr Val Thr Gly Met Thr Thr Asp Asn Leu Lys Cys Pro 
860 865 870 



15363 



tgc cag gtc cca teg ccc gaa ttt ttc aca gaa ttg gac ggg gtg cgc 
Cys Gin Val Pro Ser Pro Glu Phe Phe Thr Glu Leu Asp Gly Val Arg 
875 880 885 



15411 



eta cat agg ttt gcg ccc cec tgc aag ccc ttg ctg egg gag gag gta 
Leu His Arg Phe Ala Pro Pro Cys Lys Pro Leu Leu Arg Glu Glu Val 
890 895 900 905 



15459 



tea ttc aga gta gga etc cac gaa tac ccg gta ggg teg caa tta cct 
Ser Phe Arg Val Gly Leu His Glu Tyr Pro Val Gly Ser Gin Leu Pro 
910 915 920 



15507 



tgc gag ccc gaa ccg gac gtg gee gtg ttg aeg tee atg etc act gat 
Cys Glu Pro Glu Pro Asp Val Ala Val Leu Thr Ser Met Leu Thr Asp 
925 930 935 



15555 



eec tec cat ata aca gca gag gcg gee ggg cga agg ttg gcg agg gga 
Pro Ser His He Thr Ala Glu Ala Ala Gly Arg Arg Leu Ala Arg Gly 
940 945 950 



15603 



tea ccc ccc tct gtg gee age tec teg get age cag eta tec get cca 
Ser Pro Pro Ser Val Ala Ser Ser Ser Ala Ser Gin Leu Ser Ala Pro 
955 960 965 



15651 



tct etc aag gca act tgc acc get aac cat gac tec cct gat get gag 
Ser Leu Lys Ala Thr Cys Thr Ala Asn His Asp Ser Pro Asp Ala Glu 
970 975 980 985 



15699 



etc ata gag gee aac etc eta tgg agg cag gag atg ggc ggc aac ate 
Leu He Glu Ala Asn Leu Leu Trp Arg Gin Glu Met Gly Gly Asn He 
990 995 1000 



15747 



acc agg gtt gag tea gaa aac aaa gtg gtg att ctg gac tec ttc gat 
Thr Arg Val Glu Ser Glu Asn Lys Val Val He Leu Asp Ser Phe Asp 
1005 1010 1015 



15795 



ccg ctt gtg gcg gag gag gac gag egg gag ate tec gta ccc gca gaa 
Pro Leu Val Ala Glu Glu Asp Glu Arg Glu He Ser Val Pro Ala Glu 
1020 1025 1030 



15843 



ate ctg egg aag tct egg aga ttc gee cag gcc ctg eec gtt tgg gcg 
He Leu Arg Lys Ser Arg Arg Phe Ala Gin Ala Leu Pro Val Trp Ala 
1035 1040 1045 



15891 



egg ccg gac tat aac eec ccg eta gtg gag aeg tgg aaa aag ccc gac 
Arg Pro Asp Tyr Asn Pro Pro Leu Val Glu Thr Trp Lys Lys Pro Asp 
1050 1055 1060 1065 



15939 
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tac gaa cca cct gtg gtc cat ggc tgc ccg ctt cca cct cca aag tec 
Tyr Glu Pro Pro Val Val His Gly Cys Pro Leu Pro Pro Pro Lys Ser 
1070 1075 1080 



15987 



cct cct gtg cct ccg cct egg aag aag egg acg gtg gtc etc act gaa 
Pro Pro Val Pro Pro Pro Arg Lys Lys Arg Thr Val Val Leu Thr Glu 
1085 1090 1095 



16035 



tea acc eta tct act gee ttg gee gag etc gcc ace aga age ttt ggc 
Ser Thr Leu Ser Thr Ala Leu Ala Glu Leu Ala Thr Arg Ser Phe Gly 
1100 1105 1110 



16083 



age tec tea act tec ggc att acg ggc gac aat acg aca aca tec tct 
Ser Ser Ser Thr Ser Gly lie Thr Gly Asp Asn Thr Thr Thr Ser Ser 
1115 1120 1125 



16131 



gag ccc gcc cct tct ggc tgc ccc ccc gac tec gac get gag tec tat 
Glu Pro Ala Pro Ser Gly Cys Pro Pro Asp Ser Asp Ala Glu Ser Tyr 
1130 1135 1140 1145 



16179 



tec tec atg ccc ccc ctg gag ggg gag cct ggg gat ccg gat ctt age 
Ser Ser Met Pro Pro Leu Glu Gly Glu Pro Gly Asp Pro Asp Leu Ser 
1150 1155 1160 



16227 



gac ggg tea tgg tea acg gtc agt agt gag gee aac gcg gag gat gtc 
Asp Gly Ser Trp Ser Thr Val Ser Ser Glu Ala Asn Ala Glu Asp Val 
1165 1170 1175 



16275 



gtg tgc tgc tea atg tct tac tct tgg aca ggc gca etc gtc acc ccg 
Val Cys Cys Ser Met Ser Tyr Ser Trp Thr Gly Ala Leu Val Thr Pro 
1180 1185 1190 



16323 



tgc gee gcg gaa gaa cag aaa ctg ccc ate aat gca eta age aac teg 
Cys Ala Ala Glu Glu Gin Lys Leu Pro He Asn Ala Leu Ser Asn Ser 
1195 1200 1205 



16371 



ttg eta cgt cac cac aat ttg gtg tat tee acc ace tea ege agt get 
Leu Leu Arg His His Asn Leu Val Tyr Ser Thr Thr Ser Arg Ser Ala 
1210 1215 1220 1225 



16419 



tgc caa agg cag aag aaa gtc aca ttt gac aga ctg caa gtt ctg gac 
Cys Gin Arg Gin Lys Lys Val Thr Phe Asp Arg Leu Gin Val Leu Asp 
1230 1235 1240 



16467 



age eat tac cag gac gta etc aag gag gtt aaa gca gcg gcg tea aaa 
Ser His Tyr Gin Asp Val Leu Lys Glu Val Lys Ala Ala Ala Ser Lys 
1245 1250 1255 



16515 



gtg aag get aac ttg eta tec gta gag gaa get tgc age ctg acg ccc 
Val Lys Ala Asn Leu Leu Ser Val Glu Glu Ala Cys Ser Leu Thr Pro 
1260 1265 1270 



16563 



cca cac tea gee aaa tec aag ttt ggt tat ggg gca aaa gac gtc cgt 
Pro His Ser Ala Lys Ser Lys Phe Gly Tyr Gly Ala Lys Asp Val Arg 
1275 1280 1285 



16611 



tgc cat gcc aga aag gcc gta acc cac ate aac tec gtg tgg aaa gac 
Cys His Ala Arg Lys Ala Val Thr His He Asn Ser Val Trp Lys Asp 
1290 1295 1300 1305 



16659 
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ctt ctg gaa gac aat gta aca cca ata gac act acc ate atg get aag 
Leu Leu Glu Asp Asn Val Thr Pro lie Asp Thr Thr lie Met Ala Lys 
1310 1315 1320 



16707 



aae gag gtt ttc tgc gtt cag cct gag aag ggg ggt cgt aag cca get 
Asn Glu Val Phe Cys Val Gin Pro Glu Lys Gly Gly Arg Lys Pro Ala 
1325 1330 1335 



16755 



cgt etc ate gtg ttc ccc gat ctg ggc gtg egc gtg tgc gaa aag atg 
Arg Leu lie Val Phe Pro Asp Leu Gly Val Arg Val Cys Glu Lys Met 
1340 1345 1350 



16803 



get ttg tac gac gtg gtt aca aag etc ccc ttg gee gtg atg gga age 
Ala Leu Tyr Asp Val Val Thr Lys Leu Pro Leu Ala Val Met Gly Ser 
1355 1360 1365 



16851 



tec tac gga tte caa tac tea cca gga cag egg gtt gaa ttc etc gtg 
Ser Tyr Gly Phe Gin Tyr Ser Pro Gly Gin Arg Val Glu Phe Leu Val 
1370 1375 1380 1385 



16899 



caa gcg tgg aag tee aag aaa ace cca atg ggg ttc teg tat gat acc 
Gin Ala Trp Lys Ser Lys Lys Thr Pro Met Gly Phe Ser Tyr Asp Thr 
1390 1395 1400 



16947 



cgc tgc ttt gac tec aca gtc act gag age gac ate cgt acg gag gag 
Arg Cys Phe Asp Ser Thr Val Thr Glu Ser Asp lie Arg Thr Glu Glu 
1405 1410 1415 



16995 



gca ate tac caa tgt tgt gac etc gac ccc caa gee cgc gtg gee ate 
Ala lie Tyr Gin Cys Cys Asp Leu Asp Pro Gin Ala Arg Val Ala lie 
1420 1425 1430 



17043 



aag tec etc ace gag agg ctt tat gtt ggg ggc cet ctt ace aat tea 
Lys Ser Leu Thr Glu Arg Leu Tyr Val Gly Gly Pro Leu Thr Asn Ser 
1435 1440 1445 



17091 



agg ggg gag aac tgc ggc tat cgc agg tgc egc gcg age ggc gta ctg 
Arg Gly Glu Asn Cys Gly Tyr Arg Arg Cys Arg Ala Ser Gly Val Leu 
1450 1455 1460 1465 



17139 



aca act age tgt ggt aac ace etc act tgc tac ate aag gee egg gca 
Thr Thr Ser Cys Gly Asn Thr Leu Thr Cys Tyr He Lys Ala Arg Ala 
1470 1475 1480 



17187 



gee tgt ega gcc gca ggg etc cag gac tgc acc atg etc gtg tgt ggc 
Ala Cys Arg Ala Ala Gly Leu Gin Asp Cys Thr Met Leu Val Cys Gly 
1485 1490 1495 



17235 



gac gae tta gtc gtt ate tgt gaa age gcg ggg gtc cag gag gac gcg 
Asp Asp Leu Val Val lie Cys Glu Ser Ala Gly Val Gin Glu Asp Ala 
1500 1505 1510 



17283 



gcg age ctg aga gee ttc acg gag get atg ace agg tac tec gee ccc 
Ala Ser Leu Arg Ala Phe Thr Glu Ala Met Thr Arg Tyr Ser Ala Pro 
1515 1520 1525 



17331 



cct ggg gac ccc cca caa cca gaa tac gac ttg gag etc ata aca tea 
Pro Gly Asp Pro Pro Gin Pro Glu Tyr Asp Leu Glu Leu He Thr Ser 
1530 1535 1540 1545 



17379 
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tgc tec tec aac gtg tea gtc gcc cac gac ggc get gga aag agg gtc 
Cys Ser Ser Asn Val Ser Val Ala His Asp Gly Ala Gly Lys Arg Val 
1550 1555 1560 



17427 



tac tae etc ace egt gac eet aca aec ccc etc gcg aga get gcg tgg 
Tyr Tyr Leu Thr Arg Asp Pro Thr Thr Pro Leu Ala Arg Ala Ala Trp 
1565 1570 1575 



17475 



gag aca gea aga cae act cca gtc aat tec tgg eta ggc aac ata ate 
Glu Thr Ala Arg His Thr Pro Val Asn Ser Trp Leu Gly Asn lie lie 
1580 1585 1590 



17523 



atg ttt gcc cce aca ctg tgg gcg agg atg ata ctg atg acc cat ttc 
Met Phe Ala Pro Thr Leu Trp Ala Arg Met lie Leu Met Thr His Phe 
1595 1600 1605 



17571 



ttt age gtc ctt ata gee agg gac eag ctt gaa cag gcc etc gat tgc 
Phe Ser Val Leu lie Ala Arg Asp Gin Leu Glu Gin Ala Leu Asp Cys 
1610 1615 1620 1625 



17619 



gag ate tac ggg gcc tgc tae tec ata gaa cca ctg gat eta cct cca 
Glu lie Tyr Gly Ala Cys Tyr Ser He Glu Pro Leu Asp Leu Pro Pro 
1630 1635 1640 



17667 



ate att caa aga etc cat ggc etc age gca ttt tea etc cac agt tac 
He He Gin Arg Leu His Gly Leu Ser Ala Phe Ser Leu His Ser Tyr 
1645 1650 1655 



17715 



tct eea ggt gaa ate aat agg gtg gcc gea tgc etc aga aaa ctt ggg 
Ser Pro Gly Glu He Asn Arg Val Ala Ala Cys Leu Arg Lys Leu Gly 
1660 1665 1670 



17763 



gta ccg cec ttg ega get tgg aga cac egg gcc egg age gtc cge get 
Val Pro Pro Leu Arg Ala Trp Arg His Arg Ala Arg Ser Val Arg Ala 
1675 1680 1585 



17811 



agg ctt ctg gee aga gga ggc agg get gee ata tgt ggc aag tac etc 
Arg Leu Leu Ala Arg Gly Gly Arg Ala Ala He Cys Gly Lys Tyr Leu 
1690 1695 1700 1705 



17859 



ttc aac tgg gca gta aga aca aag etc aaa etc act cca ata gcg gcc 
Phe Asn Trp Ala Val Arg Thr Lys Leu Lys Leu Thr Pro He Ala Ala 
1710 1715 1720 



17907 



get ggc cag ctg gac ttg tec ggc tgg ttc aeg get ggc tac age ggg 
Ala Gly Gin Leu Asp Leu Ser Gly Trp Phe Thr Ala Gly Tyr Ser Gly 
1725 1730 1735 



17955 



gga gac att tat cae age gtg tct cat gee egg cec cge tgg ate tgg 
Gly Asp He Tyr His Ser Val Ser His Ala Arg Pro Arg Trp He Trp 
1740 1745 1750 



18003 



ttt tgc eta etc ctg ctt get gea ggg gta ggc ate tac etc etc ccc 
Phe Cys Leu Leu Leu Leu Ala Ala Gly Val Gly He Tyr Leu Leu Pro 
1755 1760 1765 



18051 



aac ega tgaaggttgg ggtaaacact ccggcetaaa aaaaaaaaaa aatctagaac 

Asn Arg 

1770 



18107 
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ccgagtcgac tttgttccca ctgtactttt agctcgtaca aaatacaata tacttttcat 18167 
ttctccgtaa acaacatgtt ttcccatgta atatcctttt ctatttttcg ttccgttacc 18227 
aactttacac atactttata tagctattca cttctataca ctaaaaaact aagacaattt 18287 
taattttgct gcctgccata tttcaatttg ttataaattc ctataattta tcctattagt 18347 
agctaaaaaa agatgaatgt gaatcgaatc ctaagagaat tggatctgat ccacaggacg 18407 
9gtgt99tcg ccatgatcgc gtagtcgata gtggctccaa gtagcgaagc gagcaggact 18467 
gggcggcggc caaagcggtc ggacagtgct ccgagaacgg gtgcgcatag aaattgcatc 18527 
aacgcatata gcgctagcag cacgccatag tgactggcga tgctgtcgga atggacgata 18587 
tcccgcaaga ggcccggcag taccggcata accaagccta tgcctacagc atccagggtg 18647 
acggtgccga ggatgacgat gagcgcattg ttagatttca tacacggtgc ctgactgcgt 18707 
tagcaattta actgtgataa actaccgcat taaagctttt tctttccaat tttttttttt 18767 
tcgtcattat aaaaatcatt acgaccgaga ttcccgggta ataactgata taattaaatt 18827 
gaagctctaa tttgtgagtt tagtatacat gcatttactt ataatacagt tttttagttt 18887 
tgctggccgc atcttctcaa atatgcttcc cagcctgctt ttctgtaacg ttcaccctct 18947 
accttagcat cccttccctt tgcaaatagt cctcttccaa caataataat gtcagatcct 19007 
gtagagacca catcatccac ggttctatac tgttgaccca atgcgtctcc cttgtcatct 19067 
aaacccacac cgggtgtcat aatcaaccaa tcgtaacctt catctcttcc acccatgtct 19127 
ctttgagcaa taaagccgat aacaaaatct ttgtcgctct tcgcaatgtc aacagtaccc 19187 
ttagtatatt ctccagtaga tagggagccc ttgcatgaca attctgctaa catcaaaagg 19247 
cctctaggtt cctttgttac ttcttctgcc gcctgcttca aaccgctaac aatacctggg 19307 
cccaccacac cgtgtgcatt cgtaatgtct gcccattctg ctattctgta tacacccgca 19367 
gagtactgca atttgactgt attaccaatg tcagcaaatt ttctgtcttc gaagagtaaa 19427 
aaattgtact tggcggataa tgcctttagc ggcttaactg tgccctccat ggaaaaatca 19487 
gtcaagatat ccacatgtgt ttttagtaaa caaattttgg gacctaatgc ttcaactaac 19547 
tccagtaatt ccttggtggt acgaacatcc aatgaagcac acaagtttgt ttgcttttcg 19607 
tgcatgatat taaatagctt ggcagcaaca ggactaggat gagtagcagc acgttcctta 19667 
tatgtagctt tcgacatgat ttatcttcgt ttcctgcagg tttttgttct gtgcagttgg 19727 
gttaagaata ctgggcaatt tcatgtttct tcaacactac atatgcgtat atataccaat 19787 
ctaagtctgt gctccttcct tcgttcttcc ttctgttcgg agattaccga atcaaaaaaa 19847 
tttcaaggaa accgaaatca aaaaaaagaa taaaaaaaaa atgatgaatt gaaaagctta 19907 
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tcgat 



19912 



<210> 9 

<211> 1771 

<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: pd.deltaNS3NS5 



Met Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val Leu Val Leu Asn Pro 
1 5 10 15 

Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr Met Ser Lys Ala His 
20 25 30 

Gly lie Asp Pro Asn lie Arg Thr Gly Val Arg Thr lie Thr Thr Gly 
35 40 45 

Ser Pro lie Thr Tyr Ser Thr Tyr Gly Lys Phe Leu Ala Asp Gly Gly 
50 55 60 

Cys Ser Gly Gly Ala Tyr Asp lie lie lie Cys Asp Glu Cys His Ser 
65 70 75 80 

Thr Asp Ala Thr Ser lie Leu Gly lie Gly Thr Val Leu Asp Gin Ala 
85 90 95 

Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala Thr Ala Thr Pro Pro 
100 105 110 

Gly Ser Val Thr Val Pro His Pro Asn He Glu Glu Val Ala Leu Ser 
115 120 125 

Thr Thr Gly Glu He Pro Phe Tyr Gly Lys Ala He Pro Leu Glu Val 
130 135 140 

lie Lys Gly Gly Arg His Leu He Phe Cys His Ser Lys Lys Lys Cys 
145 150 155 160 

Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly He Asn Ala Val Ala 
165 170 175 

Tyr Tyr Arg Gly Leu Asp Val Ser Val He Pro Thr Ser Gly Asp Val 
180 185 190 

Val Val Val Ala Thr Asp Ala Leu Met Thr Gly Tyr Thr Gly Asp Phe 
195 200 205 

Asp Ser Val He Asp Cys Asn Thr Cys Val Thr Gin Thr Val Asp Phe 
210 215 220 

Ser Leu Asp Pro Thr Phe Thr He Glu Thr He Thr Leu Pro Gin Asp 
225 230 235 240 

Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr Gly Arg Gly Lys Pro 



<400> 



9 



245 



250 



255 
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Gly He Tyr Arg Phe Val Ala Pro Gly Glu Arg Pro Ser Gly M t Phe 
260 265 270 

Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala Gly Cys Ala Trp Tyr 
275 280 285 

Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu Arg Ala Tyr Met Asn 
290 295 300 

Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu Glu Phe Trp Glu Gly 
305 310 315 320 

Val Phe Thr Gly Leu Thr His He Asp Ala His Phe Leu Ser Gin Thr 
325 330 335 

Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val Ala Tyr Gin Ala Thr 
340 345 350 

Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser Trp Asp Gin Met Trp 
355 360 365 

Lys Cys Leu lie Arg Leu Lys Pro Thr Leu His Gly Pro Thr Pro Leu 
370 375 380 

Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu He Thr Leu Thr His Pro 
385 390 395 400 

Val Thr Lys Tyr He Met Thr Cys Met Ser Ala Asp Leu Glu Val Val 
405 410 415 

Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu Ala Ala Leu Ala Ala 
420 425 430 

Tyr Cys Leu Ser Thr Gly Cys Val Val He Val Gly Arg Val Val Leu 
435 440 445 

Ser Gly Lys Pro Ala He He Pro Asp Arg Glu Val Leu Tyr Arg Glu 
450 455 460 

Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu Pro Tyr He Glu Gin 
465 470 475 480 

Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys Ala Leu Gly Leu Leu 
485 490 495 

Gin Thr Ala Ser Arg Gin Ala Glu Val He Ala Pro Ala Val Gin Thr 
500 505 510 

Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys His Met Trp Asn Phe 
515 520 525 

He Ser Gly He Gin Tyr Leu Ala Gly Leu Ser Thr Leu Pro Gly Asn 
530 535 540 

Pro Ala He Ala Ser Leu Met Ala Phe Thr Ala Ala Val Thr Ser Pro 
545 550 555 560 



Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn He Leu Gly Gly Trp Val 
565 570 575 
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Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr Ala Phe Val Gly Ala 
580 585 590 

Gly Leu Ala Gly Ala Ala He Gly Ser Val Gly Leu Gly Lys Val Leu 
595 600 605 

He Asp He Leu Ala Gly Tyr Gly Ala Gly Val Ala Gly Ala Leu Val 
610 615 620 

Ala Phe Lys He Met Ser Gly Glu Val Pro Ser Thr Glu Asp Leu Val 
625 630 635 640 

Asn Leu Leu Pro Ala He* Leu Ser Pro Gly Ala Leu Val Val Gly Val 
645 650 655 

Val Cys Ala Ala He Leu Arg Arg His Val Gly Pro Gly Glu Gly Ala 
660 665 670 

Val Gin Trp Met Asn Arg Leu He Ala Phe Ala Ser Arg Gly Asn His 
675 680 685 

Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp Ala Ala Ala Arg Val 
690 695 700 

Thr Ala He Leu Ser Ser Leu Thr Val Thr Gin Leu Leu Arg Arg Leu 
705 710 715 720 

His Gin Trp He Ser Ser Glu Cys Thr Thr Pro Cys Ser Gly Ser Trp 
725 730 735 

Leu Arg Asp He Trp Asp Trp He Cys Glu Val Leu Ser Asp Phe Lys 
740 745 750 

Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu Pro Gly He Pro Phe 
755 760 765 

Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp Arg Gly Asp Gly He 
770 775 780 

Met His Thr Arg Cys His Cys Gly Ala Glu He Thr Gly His Val Lys 
785 790 795 800 

Asn Gly Thr Met Arg He Val Gly Pro Arg Thr Cys Arg Asn Met Trp 
805 810 815 

Ser Gly Thr Phe Pro He Asn Ala Tyr Thr Thr Gly Pro Cys Thr Pro 
820 825 830 

Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp Arg Val Ser Ala Glu 
835 840 845 

Glu Tyr Val Glu He Arg Gin Val Gly Asp Phe His Tyr Val Thr Gly 
850 855 860 

Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin Val Pro Ser Pro Glu 
865 870 875 880 



Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His Arg Phe Ala Pro Pro 
885 890 895 



61 



wo 01/38360 



PCT/USOO/32326 



Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe Arg Val Gly Leu His 
900 905 910 

Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu Pro Glu Pro Asp Val 
915 920 925 

Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser His lie Thr Ala Glu 
930 935 940 

Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro Pro Ser Val Ala Ser 
945 950 955 960 

Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu Lys Ala Thr Cys Thr 
965 970 975 

Ala Asn His Asp Ser Pro Asp Ala Glu Leu He Glu Ala Asn Leu Leu 
980 985 990 

Trp Arg Gin Glu Met Gly Gly Asn He Thr Arg Val Glu Ser Glu Asn 
995 1000 1005 

Lys Val Val He Leu Asp Ser Phe Asp Pro Leu Val Ala Glu Glu Asp 
1010 1015 1020 

Glu Arg Glu He Ser Val Pro Ala Glu lie Leu Arg Lys Ser Arg Arg 
025 1030 1035 1040 

Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro Asp Tyr Asn Pro Pro 
1045 1050 1055 

Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu Pro Pro Val Val His 
1060 1065 1070 

Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro Val Pro Pro Pro Arg 
1075 1080 1085 

Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr Leu Ser Thr Ala Leu 
1090 1095 1100 

Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser Ser Thr Ser Gly He 
105 1110 1115 1120 

Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro Ala Pro Ser Gly Cys 
1125 1130 1135 

Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser Met Pro Pro Leu Glu 
1140 1145 1150 

Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly Ser Trp Ser Thr Val 
1155 1160 1165 

Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys Cys Ser Met Ser Tyr 
1170 1175 1180 

Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala Ala Glu Glu Gin Lys 
185 1190 1195 1200 

Leu Pro He Asn Ala Leu Ser Asn Ser Leu Leu Arg His His Asn Leu 
1205 1210 1215 
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Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin Arg Gin Lys Lys Val 
1220 1225 1230 

Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His Tyr Gin Asp Val Leu 
1235 1240 1245 

Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys Ala Asn Leu Leu Ser 
1250 1255 1260 

Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His Ser Ala Lys Ser Lys 
265 1270 1275 1280 

Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His Ala Arg Lys Ala Val 
1285 1290 1295 

Thr His lie Asn Ser Val Trp Lys Asp Leu Leu Glu Asp Asn Val Thr 
1300 1305 1310 

Pro lie Asp Thr Thr lie Met Ala Lys Asn Glu Val Phe Cys Val Gin 
1315 1320 1325 

Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu lie Val Phe Pro Asp 
1330 1335 1340 

Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu Tyr Asp Val Val Thr 
345 1350 1355 1360 

Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr Gly Phe Gin Tyr Ser 
1365 1370 1375 

Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala Trp Lys Ser Lys Lys 
1380 1385 1390 

. Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys Phe Asp Ser Thr Val 
1395 1400 1405 

Thr Glu Ser Asp lie Arg Thr Glu Glu Ala lie Tyr Gin Cys Cys Asp 
1410 1415 1420 

Leu Asp Pro Gin Ala Arg Val Ala lie Lys Ser Leu Thr Glu Arg Leu 
425 X430 1435 1440 

Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly Glu Asn Cys Gly Tyr 
1445 1450 1455 

Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr Ser Cys Gly Asn Thr 
1460 1465 1470 

Leu Thr Cys Tyr lie Lys Ala Arg Ala Ala Cys Arg Ala Ala Gly Leu 
1475 1480 1485 

Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp Leu Val Val lie Cys 
1490 1495 1500 

Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser Leu Arg Ala Phe Thr 
505 1510 1515 1520 

Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly Asp Pro Pro Gin Pro 
1525 1530 1535 
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Glu Tyr Asp Leu Glu Leu lie Thr Ser Cys Ser Ser Asn Val Ser Val 
1540 1545 1550 

Ala His Asp Gly Ala Gly Lys TVrg Val Tyr Tyr Leu Thr Arg Asp Pro 
1555 1560 1565 

Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr Ala Arg His Thr Pro 
1570 1575 1580 

Val Asn Ser Trp Leu Gly Asn lie lie Met Phe Ala Pro Thr Leu Trp 
585 1590 1595 1600 

Ala Arg Met He Leu Met Thr His Phe Phe Ser Val Leu He Ala Arg 
1605 1610 1615 

Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu He Tyr Gly Ala Cys Tyr 
1620 1625 1630 

Ser He Glu Pro Leu Asp Leu Pro Pro He He Gin Arg Leu His Gly 
1635 1640 1645 

Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro Gly Glu He Asn Arg 
1650 1655 1660 

Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro Pro Leu Arg Ala Trp 
665 1670 1675 1680 

Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu Leu Ala Arg Gly Gly 
1685 1690 1695 

Arg Ala Ala He Cys Gly Lys Tyr Leu Phe Asn Trp Ala Val Arg Thr 
1700 1705 1710 

Lys Leu Lys Leu Thr Pro He Ala Ala Ala Gly Gin Leu Asp Leu Ser 
1715 1720 1725 

Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp He Tyr His Ser Val 
1730 1735 1740 

Ser His Ala Arg Pro Arg Trp He Trp Phe Cys Leu Leu Leu Leu Ala 
745 1750 1755 1760 

Ala Gly Val Gly He Tyr Leu Leu Pro Asn Arg 
1765 1770 



<210> 10 

<211> 19798 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 
pd.deltaNS3NS5.pj 

<220> 

<221> CDS 

<222> (12679) . . (17991) 
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<400> 10 
atcgatccta 


ccccttgcgc 


taaagaagta 


tatgtgccta 


ctaacgcttg 


tctttgtctc 


60 


tgtcactaaa 


cactggatta 


ttactcccag 


atacttattt 


tggactaatt 


taaatgattt 


120 


cggatcaacg 


ttcttaatat 


cgctgaatct 


tccacaattg 


atgaaagtag 


ctaggaagag 


180 


gaattggtat 


aaagtttttg 


tttttgtaaa 


tctcgaagta 


tactcaaacg 


aatttagtat 


240 


tttctcagtg 


atctcccaga 


tgctttcacc 


ctcacttaga 


agtgctttaa 


gcattttttt 


300 


actgtggcta 


tttcccttat 


ctgcttcttc 


cgatgattcg 


aactgtaatt 


gcaaactact 


360 


tacaatatca 


gtgatatcag 


attgatgttt 


ttgtccatag 


taaggaataa 


ttgtaaattc 


420 


ccaagcagga 


atcaatttct 


ttaatgaggc 


ttccagaatt 


gttgcttttt 


gcgtcttgta 


480 


tttaaactgg 


agtgatttat 


tgacaatatc 


gaaactcagc 


gaattgctta 


tgatagtatt 


540 


atagctcatg 


aatgtggctc 


tcttgattgc 


tgttccgtta 


tgtgtaatca 


tccaacataa 


600 


ataggttagt 


tcagcagcac 


ataatgctat 


tttctcacct 


gaaggtcttt 


caaacctttc 


660 


cacaaactga 


cgaacaagca 


ccttaggtgg 


tgttttacat 


aatatatcaa 


attgtggcat 


720 


gcttagcgcc 


gatcttgtgt 


gcaattgata 


tctagtttca 


actactctat 


ttatcttgta 


780 


tcttgcagta 


ttcaaacacg 


ctaactcgaa 


aaactaactt 


taattgtcct 


gtttgtctcg 


640 


cgttctttcg 


aaaaatgcac 


cggccgcgca 


ttatttgtac 


tgcgaaaata 


attggtactg 


900 


cggtatcttc 


atttcatatt 


ttaaaaatgc 


acctttgctg 


cttttcctta 


atttttagac 


960 


ggcccgcagg 


ttcgttttgc 


ggtactatct 


tgtgataaaa 


agttgttttg 


acatgtgatc 


1020 


tgcacagatt 


ttataatgta 


ataagcaaga 


atacattatc 


aaacgaacaa 


tactggtaaa 


1080 


agaaaaccaa 


aatggacgac 


attgaaacag 


ccaagaatct 


gacggtaaaa 


gcacgtacag 


1140 


cttatagcgt 


ctgggatgta 


tgtcggctgt 


ttattgaaat 


gattgctcct 


gatgtagata 


1200 


ttgatataga 


gagtaaacgt 


aagtctgatg 


agctactctt 


tccaggatat 


gtcataaggc 


1260 


ccatggaatc 


tctcacaacc 


ggtaggccgt 


atggtcttga 


ttctagcgca 


gaagattcca 


1320 


gcgtatcttc 


tgactccagt 


gctgaggtaa 


ttttgcctgc 


tgcgaagatg 


gttaaggaaa 


1380 


ggtttgattc 


gattggaaat 


ggtatgctct 


cttcacaaga 


agcaagtcag 


gctgccatag 


1440 


atttgatgct 


acagaataac 


aagctgttag 


acaatagaaa 


gcaactatac 


aaatctattg 


1500 


ctataataat 


aggaagattg 


cccgagaaag 


acaagaagag 


agctaccgaa 


atgctcatga 


1560 


gaaaaatgga 


ttgtacacag 


ttattagtcc 


caccagctcc 


aacggaagaa 


gatgttatga 


1620 


agctcgtaag 


cgtcgttacc 


caattgctta 


ctttagttcc 


accagatcgt 


caagctgctt 


1680 


taataggtga 


tttattcatc 


ccggaatctc 


taaaggatat 


attcaatagt 


ttcaatgaac 


1740 
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^99cggcaga gaatcgttta cagcaaaaaa 
accatgctaa tacaaatgaa gaagttccct 
caagaggagc atataaatta caaaacacca 
aaaaaaggag agtagcaacg agggtaaggg 
gatccaatat caaaggaaat gatagcattg 
cagcatatag aacagctaaa gggtagtgct 
gggataatat cacaggaggt actagactac 
gtacgcattt aagcataaac acgcactatg 
caacacgcag atataggtgc gacgtgaaca 
ttttcggaag cgctcgtttt cggaaacgct 
ctagaaagta taggaacttc agagcgcttt 
ttcaaaaaac caaaaacgca ccggactgta 
tccacaaaca ttgctcaaaa gtatctcttt 
aacctaccca tccacctttc gctccttgaa 
aggcttccaa tgctcttcaa attttactgt 
ctcttcataa tgtaagctta tctttatcga 
ctttacggtt ccctgagatt gaattagttc 
ctttgtacga cgaattttga ggttcgccat 
tattatctcc gcctcagttt gatcttccgc 
tatttcaccc cacaatcctt catccgcctc 
atgttgtaca ttgtttagtt cacgagaagg 
tatatgacct ttatcctgtt ctctttccac 
gcacctaata acattcttca aggcggagaa 
tgaaaacgtg agaatgaatt tagtattatt 
tcgaagataa gagaagaatg cagtgacctt 
aaaaaatacg cctttaggcc ttctgatacc 
attaatatct aaaccctctc cgatggtggc 
aaactgtgat aattctgggt gatttatgat 
aggatcaggc caatccagtt ctttttcaat 
tccaacaaat gcaaatgcta acgttttgta 



agagtgagtt ggaaggaagg actgaagtga 1800 
ccaggcgaac aagaagtaga gacacaaatg 1860 
tcactgaggg ccctaaagcg gttcccacga 1920 
gcagaaaatc acgtaatact tctagggtat 1980 
aaggatgaga ctaatccaat tgaggagtgg 2040 
gaaggaagca tacgataccc cgcatggaat 2100 
ctttcatcct acataaatag acgcatataa 2160 
ccgttcttct catgtatata tatatacagg 2220 
gtgagctgta tgtgcgcagc tcgcgttgca 2280 
ttgaagttcc tattccgaag ttcctattct 2340 
tgaaaaccaa aagcgctctg aagacgcact 2400 
acgagctact aaaatattgc gaataccgct 2460 
gctatatatc tctgtgctat atccctatat 2520 
cttgcatcta aactcgacct ctacatcaac 2580 
caagtagacc catacggctg taatatgctg 2640 
atcgtgtgaa aaactactac cgcgataaac 2700 
ctttagtata tgatacaaga cacttttgaa 2760 
cctctggcta tttccaatta tcctgtcggc 2820 
ttcagactgc catttttcac ataatgaatc 2880 
cgcatcttgt tccgttaaac tattgacttc 294 0 
gtcctcttca ggcggtagct cctgatctcc 3000 
aaacttagaa atgtattcat gaattatgga 3060 
gtttgggcca gatgcccaat atgcttgaca 3120 
gtgatattct gaggcaattt tattataatc 3180 
tgtattgaca aatggagatt ccatgtatct 3240 
ctttcccctg cggtttagcg tgccttttac 3300 
ctttaactga ctaataaatg caaccgatat 3360 
tcgatcgaca attgtattgt acactagtgc 3420 
taccggtgtg tcgtctgtat tcagtacatg 3480 
tttcttataa ttgtcaggaa ctggaaaagt 3540 
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cccccttgtc 


gtctcgatta 


cacacctact 


ttcatcgtac 


accataggtt 


ggaagtgctg 


3600 


cataatacat 


tgcttaatac 


aagcaagcag 


tctctcgcca 


ttcatatttc 


agttattttc 


3660 


cattacagct 


gatgtcattg 


tatatcagcg 


ctgtaaaaat 


ctatctgtta 


cagaaggttt 


3720 


tcgcggtttt 


tataaacaaa 


actttcgtta 


cgaaatcgag 


caatcacccc 


agctgcgtat 


3780 


ttggaaattc 


gggaaaaagt 


agagcaacgc 


gagttgcatt 


ttttacacca 


taatgcatga 


3840 


ttaacttcga 


gaagggatta 


aggctaattt 


cactagtatg 


tttcaaaaac 


ctcaatctgt 


3900 


ccattgaatg 


ccttataaaa 


cagctataga 


ttgcatagaa gagttagcta 


ctcaatgctt 


3960 


tttgtcaaag 


cttactgatg 


atgatgtgtc 


tactttcagg cgggtctgta gtaaggagaa 


4020 


tgacattata 


aagctggcac 


ttagaattcc 


acggactata gactatacta gtatactccg 


4080 


tctactgtac 


gatacacttc 


cgctcaggtc 


cttgtccttt 


aacgaggcct 


taccactctt 


4140 


ttgttactct 


attgatccag 


ctcagcaaag 


gcagtgtgat 


ctaagattct 


atcttcgcga 


4200 


tgtagtaaaa 


ctagctagac 


cgagaaagag 


actagaaatg 


caaaaggcac 


ttctacaatg 


4260 


gctgccatca 


ttattatccg 


atgtgacgct 


gcattttttt 


tttttttttt 


tttttttttt 


4320 


tttttttttt 


tttttttttt 


ttttttggta 


caaatatcat 


aaaaaaagag 


aatcttttta 


4380 


agcaaggatt 


ttcttaactt 


cttcggcgac 


agcatcaccg acttcggtgg 


tactgttgga 


4440 


accacctaaa 


tcaccagttc 


tgatacctgc 


atccaaaacc 


tttttaactg 


catcttcaat 


4500 


ggctttacct 


tcttcaggca 


agttcaatga 


caatttcaac 


atcattgcag 


cagacaagat 


4560 


agtggcgata 


gggttgacct 


tattctttgg 


caaatctgga gcggaaccat 


ggcatggttc 


4620 


gtacaaacca 


aatgcggtgt 


tcttgtctgg 


caaagaggcc 


aaggacgcag 


atggcaacaa 


4680 


acccaaggag 


cctgggataa 


cggaggcttc 


atcggagatg 


atatcaccaa 


acatgttgct 


4740 


ggtgattata 


ataccattta 


ggtgggttgg 


gttcttaact 


aggatcatgg 


cggcagaatc 


4800 


aatcaattga 


tgttgaactt 


tcaatgtagg 


gaattcgttc 


ttgatggttt 


cctccacagt 


4860 


ttttctccat 


aatcttgaag 


aggccaaaac 


attagcttta 


tccaaggacc 


aaataggcaa 


4920 


tggtggctca 


tgttgtaggg 


ccatgaaagc 


ggccattctt 


gtgattcttt 


gcacttctgg 


4980 


aacggtgtat 


tgttcactat 


cccaagcgac 


accatcacca 


tcgtcttcct 


ttctcttacc 


5040 


aaagtaaata 


cctcccacta 


attctctaac 


aacaacgaag 


tcagtacctt 


tagcaaattg 


5100 


tggcttgatt 


ggagataagt 


ctaaaagaga 


gtcggatgca 


aagttacatg gtcttaagtt 


5160 


ggcgtacaat 


tgaagttctt 


tacggatttt 


tagtaaacct 


tgttcaggtc 


taacactacc 


5220 


ggtaccccat 


ttaggaccac 


ccacagcacc 


taacaaaacg gcatcagcct 


tcttggaggc 


5280 


ttccagcgcc 


tcatctggaa 


gtggaacacc 


tgtagcatcg atagcagcac 


caccaattaa 


5340 
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atgattttcg 


aaatcgaact 


tgacattgga 


aatggcttcg 


gctgtgattt 


cttgaccaac 


aggggcagac 


attacaatgg 


tatatccttg 


aaaaaaaaaa 


atgcagcttc 


tcaatgatat 


tatccgacaa 


actgttttac 


agatttacga 


acatccgaac 


ctgggagttt 


tccctgaaac 


tatagtctag 


cgctttacgg 


aagacaatgt 


atctattgca 


taggtaatct 


tgcacgtcgc 


tgcacttcaa 


tagcatatct 


ttgttaacga 


atgcaacgcg 


agagcgctaa 


tttttcaaac 


gaaatgcaac 


gcgaaagcgc 


tattttacca 


caaaaatgca 


acgcgagagc 


gctaattttt 


gaacagaaat 


gcaacgcgag 


agcgctattt 


ttctacaaaa 


atgcatcccg 


agagcgctat 


tttctccttt 


gtgcgctcta 


taatgcagtc 


taaggttaga 


agaaggctac 


tttggtgtct 


cacttcccgc 


gtttactgat 


tactagcgaa 


atccccgatt 


atattctata 


ccgatgtgga 


gcgttgatga 


ttcttcattg 


gtcagaaaat 


atactacgta 


taggaaatgt 


ttacattttc 


tcttactaca 


atttttttgt 


ctaaagagta 


gtcgagttta 


gatgcaagtt 


caaggagcga 


agcacagaga 


tatatagcaa 


agagatactt 


aatattttag 


tagctcgtta 


cagtccggtg 


gagcgctttt 


ggttttcaaa 


agcgctctga 


tcggaatagg 


aacttcaaag 


cgtttccgaa 


tgcgcacata 


cagctcactg 


ttcacgtcgc 


tatacatgag 


aagaacggca 


tagtgcgtgt 


atttatgtag 


gatgaaaggt 


agtctagtac 


gtatcgtatg 


cttccttcag 


cactaccctt 



acgaacatca gaaatagctt taagaacctt 5400 
gtggtcacct ggcaaaacga cgatcttctt 5460 
aaatatatat aaaaaaaaaa aaaaaaaaaa 5520 
tcgaatacgc tttgaggaga tacagcctaa 5580 
tcgtacttgt tacccatcat tgaattttga 5640 
agatagtata tttgaacctg tataataata 5700 
atgtatttcg gttcctggag aaactattgc 5760 
atccccggtt cattttctgc gtttccatct 5820 
agcatctgtg cttcattttg tagaacaaaa 5880 
aaagaatctg agctgcattt ttacagaaca 5940 
acgaagaatc tgtgcttcat ttttgtaaaa 6000 
caaacaaaga atctgagctg catttttaca 6060 
taccaacaaa gaatctatac ttcttttttg 6120 
ttttctaaca aagcatctta gattactttt 6180 
tcttgataac tttttgcact gtaggtccgt 6240 
attttctctt ccataaaaaa agcctgactc 6300 
gctgcgggtg cattttttca agataaaggc 6360 
ttgcgcatac tttgtgaaca gaaagtgata 6420 
tatgaacggt ttcttctatt ttgtctctat 6480 
gtattgtttt cgattcactc tatgaatagt 654 0 
atactagaga taaacataaa aaatgtagag 6600 
aaggtggatg ggtaggttat atagggatat 6660 
ttgagcaatg tttgtggaag cggtattcgc 6720 
cgtttttggt tttttgaaag tgcgtcttca 6780 
agttcctata ctttctagag aataggaact 684 0 
aacgagcgct tccgaaaatg caacgcgagc 6900 
acctatatct gcgtgttgcc tgtatatata 6960 
ttatgcttaa atgcgtactt atatgcgtct 7020 
ctcctgtgat attatcccat tccatgcggg 7080 
tagctgttct atatgctgcc actcctcaat 7140 
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tggattagtc tcatccttca atgctatcat 
ccgagaaact agtgcgaagt agtgatcagg 
cctggccacg gcagaagcac gcttatcgct 
taggcccttc attgaaagaa atgaggtcat 
attttttata gcaaagattg aataaggcgc 
gactaagtta tcttttaata attggtattc 
atttactcgt tttaggactg gttcagaatt 
atcgatgata agctgtcaaa catgagaatt 
tatttttata ggttaatgtc atgataataa 
ggggaaatgt gcgcggaacc cctatttgtt 
cgctcatgag acaataaccc tgataaatgc 
gtattcaaca tttccgtgtc gcccttattc 
ttgctcaccc agaaacgctg gtgaaagtaa 
tgggttacat cgaactggat ctcaacagcg 
aacgttttcc aatgatgagc acttttaaag 
ttgacgccgg gcaagagcaa ctcggtcgcc 
agtactcacc agtcacagaa aagcatctta 
gtgctgccat aaccatgagt gataacactg 
gaccgaagga gctaaccgct tttttgcaca 
gttgggaacc ggagctgaat gaagccatac 
cagcaatggc aacaacgttg cgcaaactat 
ggcaacaatt aatagactgg atggaggcgg 
cccttccggc tggctggttt attgctgata 
gtatcattgc agcactgggg ccagatggta 
cggggagtca ggcaactatg gatgaacgaa 
tgattaagca ttggtaactg tcagaccaag 
aacttcattt ttaatttaaa aggatctagg 
aaatccctta acgtgagttt tcgttccact 
gatcttcttg agatcctttt tttctgcgcg 
cgctaccagc ggtggtttgt ttgccggatc 



ttcctttgat attggatcat atgcatagta 7200 
tattgctgtt atctgatgag tatacgttgt 7260 
ccaatttccc acaacattag tcaactccgt 7320 
caaatgtctt ccaatgtgag attttgggcc 7380 
atttttcttc aaagctttat tgtacgatct 7440 
ctgtttattg cttgaagaat tgccggtcct 7500 
cctcaaaaat tcatccaaat atacaagtgg 7560 
cttgaagacg aaagggcctc gtgatacgcc 7620 
tggtttctta gacgtcaggt ggcacttttc 7680 
tatttttcta aatacattca aatatgtatc 7740 
ttcaataata ttgaaaaagg aagagtatga 7800 
ccttttttgc ggcattttgc cttcctgttt 7860 
aagatgctga agatcagttg ggtgcacgag 7920 
gtaagatcct tgagagtttt cgccccgaag 7980 
ttctgctatg tggcgcggta ttatcccgtg 8040 
gcatacacta ttctcagaat gacttggttg 8X00 
cggatggcat gacagtaaga gaattatgca 8160 
cggccaactt acttctgaca acgatcggag 8220 
acatggggga tcatgtaact cgccttgatc 82 80 
caaacgacga gcgtgacacc acgatgcctg 8340 
taactggcga actacttact ctagcttccc 8400 
ataaagttgc aggaccactt ctgcgctcgg 8460 
aatctggagc cggtgagcgt gggtctcgcg 8520 
agccctcccg tatcgtagtt atctacacga 8580 
atagacagat cgctgagata ggtgcctcac 8640 
tttactcata tatactttag attgatttaa 8700 
tgaagatcct ttttgataat ctcatgacca 8760 
gagcgtcaga ccccgtagaa aagatcaaag 8820 
taatctgctg cttgcaaaca aaaaaaccac 8880 
aagagctacc aactcttttt ccgaaggtaa 8940 
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ctggcttcag 


cagagcgcag 


ataccaaata 


ctgtccttct agtgtagccg 


tagttaggcc 


9000 


accacttcaa 


gaactctgta gcaccgccta 


catacctcgc tctgctaatc 


ctgttaccag 


9060 


tggctgctgc 


cagtggcgat 


aagtcgtgtc 


ttaccgggtt ggactcaaga 


cgatagttac 


9120 


cggataaggc 


gcagcggtcg ggctgaacgg ggggttcgtg cacacagccc 


agcttggagc 


9180 


gaacgaccta 


caccgaactg agatacctac 


agcgtgagct atgagaaagc 


gccacgcttc 


9240 


ccgaagggag 


aaaggcggac aggtatccgg taagcggcag ggtcggaaca 


ggagagcgca 


9300 


cgagggagct 


tccaggggga 


aacgcctggt 


atctttatag tcctgtcggg 


tttcgccacc 


9360 


tctgacttga 


gcgtcgattt 


ttgtgatgct 


cgtcaggggg gcggagccta 


tggaaaaacg 


9420 


ccagcaacgc 


ggccttttta 


cggttcctgg 


ccttttgctg gccttttgct 


cacatgttct 


9480 


ttcctgcgtt 


atcccctgat 


tctgtggata 


accgtattac cgcctttgag 


tgagctgata 


9540 


ccgctcgccg 


cagccgaacg 


accgagcgca gcgagtcagt gagcgaggaa 


gcggaagagc 


9600 


gcctgatgcg 


gtattttctc 


cttacgcatc 


tgtgcggtat ttcacaccgc 


atatggtgca 


9660 


ctctcagtac 


aatctgctct 


gatgccgcat 


agttaagcca gtatacactc 


cgctatcgct 


9720 


acgtgactgg 


gtcatggctg 


cgccccgaca 


cccgccaaca cccgctgacg 


cgccctgacg 


9780 


ggcttgtctg 


ctcccggcat 


ccgcttacag 


acaagctgtg accgtctccg 


ggagctgcat 


9840 


gtgtcagagg 


ttttcaccgt 


catcaccgaa 


acgcgcgagg cagctgcggt 


aaagctcatc 


9900 


agcgtggtcg 


tgaagcgatt 


cacagatgtc 


tgcctgttca tccgcgtcca 


gctcgttgag 


9960 


tttctccaga 


agcgttaatg 


tctggcttct 


gataaagcgg gccatgttaa 


gggcggtttt 


10020 


ttcctgtttg 


gtcactgatg 


cctccgtgta 


agggggattt ctgttcatgg 


gggtaatgat 


10080 


accgatgaaa 


cgagagagga 


tgctcacgat 


acgggttact gatgatgaac 


atgcccggtt 


10140 


actggaacgt 


tgtgagggta 


aacaactggc 


ggtatggatg cggcgggacc 


agagaaaaat 


10200 


cactcagggt 


caatgccagc 


gcttcgttaa 


tacagatgta ggtgttccac 


agggtagcca 


10260 


gcagcatcct 


gcgatgcaga 


tccggaacat 


aatggtgcag ggcgctgact 


tccgcgtttc 


10320 


cagactttac 


gaaacacgga 


aaccgaagac 


cattcatgtt gttgctcagg 


tcgcagacgt 


10380 


tttgcagcag 


cagtcgcttc 


acgttcgctc 


gcgtatcggt gattcattct 


gctaaccagt 


10440 


aaggcaaccc 


cgccagccta gccgggtcct 


caacgacagg agcacgatca 


tgcgcacccg 


10500 


tggccaggac 


ccaacgctgc 


ccgagatgcg 


ccgcgtgcgg ctgctggaga 


tggcggacgc 


10560 


gatggatatg 


ttctgccaag ggttggtttg cgcattcaca gttctccgca 


agaattgatt 


10620 


ggctccaatt 


cttggagtgg tgaatccgtt 


agcgaggtgc cgccggcttc 


cattcaggtc 


10680 


gaggtggccc 


ggctccatgc 


accgcgacgc 


aacgcgggga ggcagacaag 


gtatagggcg 


10740 
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gcgcctacaa tccatgccaa cccgttccat gtgctcgccg aggcggcata aatcgccgtg 10800 
acgatcagcg gtccaatgat cgaagttagg ctggtaagag ccgcgagcga tccttgaagc 10860 
tgtccctgat ggtcgtcatc tacctgcctg gacagcatgg cctgcaacgc gggcatcccg 1092 0 
atgccgccgg aagcgagaag aatcataatg gggaaggcca tccagcctcg cgtcgcgaac 10980 
gccagcaaga cgtagcccag cgcgtcggcc gccatgccgg cgataatggc ctgcttctcg 1104 0 
ccgaaacgtt tggtggcggg accagtgacg aaggcttgag cgagggcgtg caagattccg 11100 
aataccgcaa gcgacaggcc gatcatcgtc gcgctccagc gaaagcggtc ctcgccgaaa 11160 
atgacccaga gcgctgccgg cacctgtcct acgagttgca tgataaagaa gacagtcata 11220 
agtgcggcga cgatagtcat gccccgcgcc caccggaagg agctgactgg gttgaaggct 112 80 
ctcaagggca tcggtcgagg atccttcaat atgcgcacat acgctgttat gttcaaggtc 11340 
ccttcgttta agaacgaaag cggtcttcct tttgagggat gtttcaagtt gttcaaatct 11400 
atcaaatttg caaatcccca gtctgtatct agagcgttga atcggtgatg cgatttgtta 11460 
attaaattga tggtgtcacc attaccaggt ctagatatac caatggcaaa ctgagcacaa 11520 
caataccagt ccggatcaac tggcaccatc tctcccgtag tctcatctaa tttttcttcc 11580 
ggatgaggtt ccagatatac cgcaacacct ttattatggt ttccctgagg gaataataga 11640 
atgtcccatt cgaaatcacc aattctaaac ctgggcgaat tgtatttcgg gtttgttaac 11700 
tcgttccagt caggaatgtt ccacgtgaag ctatcttcca gcaaagtctc cacttcttca 11760 
tcaaattgtg gagaatactc ccaatgctct tatctatggg acttccggga aacacagtac 11820 
cgatacttcc caattcgtct tcagagctca ttgtttgttt gaagagacta atcaaagaat 11880 
cgttttctca aaaaaattaa tatcttaact gatagtttga tcaaaggggc aaaacgtagg 1194 0 
ggcaaacaaa cggaaaaatc gtttctcaaa ttttctgatg ccaagaactc taaccagtct 12000 
tatctaaaaa ttgccttatg atccgtctct ccggttacag cctgtgtaac tgattaatcc 12060 
tgcctttcta atcaccattc taatgtttta attaagggat tttgtcttca ttaacggctt 12120 
tcgctcataa aaatgttatg acgttttgcc cgcaggcggg aaaccatcca cttcacgaga 12180 
ctgatctcct ctgccggaac accgggcatc tccaacttat aagttggaga aataagagaa 12240 
tttcagattg agagaatgaa aaaaaaaaac ccttagttca taggtccatt ctcttagcgc 12300 
aactacagag aacaggggca caaacaggca aaaaacgggc acaacctcaa tggagtgatg 123 60 
caacctgcct ggagtaaatg atgacacaag gcaattgacc cacgcatgta tctatctcat 12420 
tttcttacac cttctattac cttctgctct ctctgatttg gaaaaagctg aaaaaaaagg 12480 
ttgaaaccag ttccctgaaa ttattcccct acttgactaa taagtatata aagacggtag 12540 

71 



wo 01/38360 



PCTAJSOO/32326 



gtattgattg taattctgta aatctatttc ttaaacttct taaattctac ttttatagtt 12600 

agtctttttt ttagttttaa aacaccaaga acttagtttc gaataaacac acataaacaa 12660 

acaagcttac aaaacaaa atg get gca tat gca get cag ggc tat aag gtg 12711 

Met Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val 
15 10 

eta gta etc aac ccc tct gtt get gca aca ctg ggc ttt ggt get tac 12759 
Leu Val Leu Asn Pro Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr 
15 20 25 

atg tec aag get cat ggg ate gat cct aac ate agg ace ggg gtg aga 12807 
Met Ser Lys Ala His Gly He Asp Pro Asn He Arg Thr Gly Val Arg 
30 35 40 

aca att acc act ggc age ccc ate aeg tac tee ace tac ggc aag ttc 12855 
Thr He Thr Thr Gly Ser Pro He Thr Tyr Ser Thr Tyr Gly Lys Phe 
45 50 55 

ett gee gae ggc ggg tgc teg ggg ggc get tat gae ata ata att tgt 12903 
Leu Ala Asp Gly Gly Cys Ser Gly Gly Ala Tyr Asp He He He Cys 
60 65 70 75 

gae gag tgc eac tec acg gat gee aca tec ate ttg ggc att ggc act 12951 
Asp Glu Cys His Ser Thr Asp Ala Thr Ser He Leu Gly He Gly Thr 
80 85 90 

gtc ett gae eaa gca gag act gcg ggg gcg aga ctg gtt gtg etc gee 12999 
Val Leu Asp Gin Ala Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala 
95 100 105 

acc gee acc cct ecg ggc tee gtc act gtg ccc cat ccc aac ate gag 13047 
Thr Ala Thr Pro Pro Gly Ser Val Thr Val Pro His Pro Asn He Glu 
110 115 120 

gag gtt get ctg tee acc acc gga gag ate cct ttt tac ggc aag get 13095 
Glu Val Ala Leu Ser Thr Thr Gly Glu He Pro Phe Tyr Gly Lys Ala 
125 130 135 

ate eee etc gaa gta ate aag ggg ggg aga eat etc ate ttc tgt cat 13143 
He Pro Leu Glu Val He Lys Gly Gly Arg His Leu He Phe Cys His 
140 145 150 155 

tea aag aag aag tgc gae gaa etc gee gca aag ctg gtc gca ttg ggc 13191 
Ser Lys Lys Lys Cys Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly 
160 165 170 

ate aat gee gtg gee tac tac cgc ggt ett gae gtg tee gtc ate ecg 13 239 
He Asn Ala Val Ala Tyr Tyr Arg Gly Leu Asp Val Ser Val He Pro 
175 180 185 

aec age ggc gat gtt gtc gtc gtg gea acc gat gee etc atg ace ggc 13287 
Thr Ser Gly Asp Val Val Val Val Ala Thr Asp Ala Leu Met Thr Gly 
190 195 200 

tat acc ggc gae ttc gae teg gtg ata gae tgc aat acg tgt gtc acc 13335 
Tyr Thr Gly Asp Phe Asp Ser Val He Asp Cys Asn Thr Cys Val Thr 
205 210 215 
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cag aca gtc gat ttc age ctt gac cct acc ttc acc att gag aca ate 13383 
Gin Thr Val Asp Phe Ser Leu Asp Pro Thr Phe Thr He Glu Thr He 
220 225 230 235 

acg etc ccc caa gat get gte tee cgc act eaa cgt egg ggc agg act 13431 
Thr Leu Pro Gin Asp Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr 
240 245 250 

ggc agg ggg aag cca ggc ate tac aga ttt gtg gca ecg ggg gag cgc 13479 
Gly Arg Gly Lys Pro Gly He Tyr Arg Phe Val Ala Pro Gly Glu Arg 
255 260 265 

ccc tee ggc atg ttc gac teg tec gtc etc tgt gag tgc tat gae gea 13527 
Pro Ser Gly Met Phe Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala 
270 275 280 

ggc tgt get tgg tat gag etc acg ece gee gag act aca gtt agg eta 13575 
Gly Cys Ala Trp Tyr Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu 
285 290 295 

cga gcg tac atg aac acc ecg ggg ctt ccc gtg tgc cag gac cat ctt 13623 
Arg Ala Tyr Met Asn Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu 
300 305 310 315 

gaa ttt tgg gag ggc gte ttt aca ggc etc act cat ata gat gee eac 13671 
Glu Phe Trp Glu Gly Val Phe Thr Gly Leu Thr His He Asp Ala His 
320 325 330 

ttt eta tee cag aca aag cag agt ggg gag aac ctt ect tac ctg gta 13719 
Phe Leu Ser Gin Thr Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val 
335 340 345 

gcg tac caa gee ace gtg tgc get agg get caa gee cct ece cca teg 13767 
Ala Tyr Gin Ala Thr Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser 
350 355 360 

tgg gac eag atg tgg aag tgt ttg att cgc etc aag ccc ace etc eat 13815 
Trp Asp Gin Met Trp Lys Cys Leu He Arg Leu Lys Pro Thr Leu His 
365 370 375 

ggg cca aca ece ctg eta tac aga ctg ggc get gtt cag aat gaa ate 13863 
Gly Pro Thr Pro Leu Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu He 
380 385 390 395 

acc ctg aeg eac cca gte acc aaa tac ate atg aca tgc atg teg gee 13911 
Thr Leu Thr His Pro Val Thr Lys Tyr He Met Thr Cys Met Ser Ala 
400 405 410 

gae ctg gag gtc gtc aeg age ace tgg gtg etc gtt ggc ggc gte ctg 13959 
Asp Leu Glu Val Val Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu 
415 420 425 

get get ttg gee gcg tat tgc ctg tea aca ggc tgc gtg gtc ata gtg 14 007 
Ala Ala Leu Ala Ala Tyr Cys Leu Ser Thr Gly Cys Val Val He Val 
430 435 440 

ggc agg gtc gtc ttg tee ggg aag ecg gca ate ata cct gac agg gaa 14055 
Gly Arg Val Val Leu Ser Gly Lys Pro Ala He He Pro Asp Arg Glu 
445 450 455 
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gtc etc tac cga gag ttc gat gag atg gaa gag tgc tct cag cac tta 14103 
Val Leu Tyr Arg Glu Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu 
460 465 470 475 

ccg tac ate gag caa ggg atg atg etc gee gag cag ttc aag cag aag 14151 
Pro Tyr lie Glu Gin Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys 
480 485 490 

gee etc gge etc ctg cag ace gcg tec cgt eag gca gag gtt ate gee 14199 
Ala Leu Gly Leu Leu Gin Thr Ala Ser Arg Gin Ala Glu Val lie Ala 
495 500 505 

cet get gtc cag acc aac tgg caa aaa etc gag ace ttc tgg gcg aag 14247 
Pro Ala Val Gin Thr Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys 
510 515 520 

cat atg tgg aac ttc ate agt ggg ata caa tac ttg gcg ggc ttg tea 142 95 
His Met Trp Asn Phe lie Ser Gly lie Gin Tyr Leu Ala Gly Leu Ser 
525 530 535 

acg ctg eet ggt aac ccc gee att get tea ttg atg get ttt aea get 14343 
Thr Leu Pro Gly Asn Pro Ala lie Ala Ser Leu Met Ala Phe Thr Ala 
540 545 550 555 

get gtc ace age cea eta ace act age caa acc etc etc ttc aac ata 143 91 
Ala Val Thr Ser Pro Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn lie 
560 565 570 

ttg ggg ggg tgg gtg get gee cag etc gee gee ccc ggt gee get act 144 39 
Leu Gly Gly Trp Val Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr 
575 580 585 

gee ttt gtg gge get ggc tta get gge gee gee ate ggc agt gtt gga 144 87 
Ala Phe Val Gly Ala Gly Leu Ala Gly Ala Ala He Gly Ser Val Gly 
590 595 600 

ctg ggg aag gtc etc ata gae ate ett gca ggg tat gge gcg ggc gtg 14535 
Leu Gly Lys Val Leu He Asp He Leu Ala Gly Tyr Gly Ala Gly Val 
605 610 615 

gcg gga get ctt gtg gca ttc aag ate atg age ggt gag gtc ccc tec 14583 
Ala Gly Ala cLeu Val Ala Phe Lys He Met Ser Gly Glu Val Pro Ser 
620 625 630 635 

acg gag gac ctg gtc aat eta ctg ccc gee ate etc teg eee gga gee 14 631 
Thr Glu Asp Leu Val Asn Leu Leu Pro Ala He Leu Ser Pro Gly Ala 
640 645 650 

etc gta gtc ggc gtg gtc tgt gca gca ata ctg cge egg cac gtt gge 14679 
Leu Val Val Gly Val Val Cys Ala Ala He Leu Arg Arg His Val Gly 
655 660 665 

ccg gge gag ggg gca gtg cag tgg atg aac egg ctg ata gee ttc gee 14727 
Pro Gly Glu Gly Ala Val Gin Trp Met Asn Arg Leu He Ala Phe Ala 
670 675 680 

tec egg ggg aac cat gtt tec ccc acg cac tac gtg ccg gag age gat 14775 
Ser Arg Gly Asn His Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp 
685 690 695 
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gca get gcc cgc gtc act gcc ata etc age age etc act gta ace eag 14823 
Ala Ala Ala Arg Val Thr Ala lie Leu Ser Ser Leu Thr Val Thr Gin 
700 705 710 715 

etc ctg agg cga ctg cac eag tgg ata age teg gag tgt acc act cea 14871 
Leu Leu Arg Arg Leu His Gin Trp lie Ser Ser Glu Cys Thr Thr Pro 
720 725 730 

tgc tec ggt tec tgg eta agg gae ate tgg gac tgg ata tgc gag gtg 14919 
Cys Ser Gly Ser Trp Leu Arg Asp lie Trp Asp Trp He Cys Glu Val 
735 740 745 

ttg age gae ttt aag ace tgg eta aaa get aag etc atg cea eag ctg 14967 
Leu Ser Asp Phe Lys Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu 
750 755 760 

ect ggg ate cee ttt gtg tee tgc eag cgc ggg tat aag ggg gtc tgg 15015 
Pro Gly He Pro Phe Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp 
765 770 775 

cga ggg gac ggc ate atg cac act cgc tgc cac tgt gga get gag ate 15063 
Arg Gly Asp Gly He Met His Thr Arg Cys His Cys Gly Ala Glu He 
780 785 790 795 

act gga eat gtc aaa aac ggg acg atg agg ate gtc ggt ect agg aee 15111 
Thr Gly His Val Lys Asn Gly Thr Met Arg He Val Gly Pro Arg Thr 
800 805 810 

tgc agg aac atg tgg agt ggg ace tte cee att aat gcc tac aee aeg 15159 
Cys Arg Asn Met Trp Ser Gly Thr Phe Pro He Asn Ala Tyr Thr Thr 
815 820 825 

ggc cee tgt ace cee ett ect gcg ceg aac tae acg. tte geg eta tgg 15207 
Gly Pro Cys Thr Pro Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp 
830 835 840 

agg gtg tet gca gag gaa tae gtg gag ata agg eag gtg ggg gac tte 15255 
Arg Val Ser Ala Glu Glu Tyr Val Glu He Arg Gin Val Gly Asp Phe 
845 850 855 

cac tac gtg acg ggt atg act act gac aat ett aaa tgc ceg tgc eag 15303 
His Tyr Val Thr Gly Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin 
860 865 870 875 

gtc cea teg cee gaa ttt tte aca gaa ttg gae ggg gtg cgc eta eat 15351 
Val Pro Ser Pro Glu Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His 
880 885 890 

agg ttt gcg cee cee tgc aag cee ttg ctg egg gag gag gta tea tte 15399 
Arg Phe Ala Pro Pro Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe 
895 900 905 

aga gta gga etc cac gaa tac ceg gta ggg teg caa tta ect tgc gag 15447 
Arg Val Gly Leu His Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu 
910 915 920 

ecc gaa ceg gac gtg gee gtg ttg acg tec atg etc act gat ccc tec 15495 
Pro Glu Pro Asp Val Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser 
925 930 935 
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cat ata aca gca gag gcg gcc ggg cga agg ttg gcg agg gga tea ccc 
His He Thr Ala Glu Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro 
940 945 950 955 



15543 



ccc tct gtg gcc age tec teg get age cag eta tec get cea tct etc 15591 

Pro Ser Val Ala Ser Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu 

960 965 970 

aag gca act tge ace get aac cat gac tee cet gat get gag etc ata 15639 

Lys Ala Thr Cys Thr Ala Asn His Asp Ser Pro Asp Ala Glu Leu He 

975 980 985 



gag gcc aac etc eta tgg agg cag gag atg gge ggc aac ate ace agg 
Glu Ala Asn Leu Leu Trp Arg Gin Glu Met Gly Gly Asn He Thr Arg 
990 995 1000 



15687 



gtt gag tea gaa aac aaa gtg gtg att etg gac tec ttc gat ceg ctt 
Val Glu Ser Glu Asn Lys Val Val He Leu Asp Ser Phe Asp Pro Leu 
1005 1010 1015 



15735 



gtg gcg gag gag gac gag egg gag ate tec gta ecc gca gaa ate ctg 
Val Ala Glu Glu Asp Glu Arg Glu He Ser Val Pro Ala Glu He Leu 
1020 1025 1030 1035 



15783 



egg aag tct egg aga ttc gcc cag gcc ctg ccc gtt tgg gcg egg ccg 
Arg Lys Ser Arg Arg Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro 
1040 1045 1050 



15831 



gac tat aac ccc ccg eta gtg gag aeg tgg aaa aag cee gae tae gaa 
Asp Tyr Asn Pro Pro Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu 
1055 1060 1065 



15879 



cca cet gtg gtc eat gge tge ccg ctt cea cet cea aag tec cet cct 
Pro Pro Val Val His Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro 
1070 1075 1080 



15927 



gtg ect ceg eet egg aag aag egg aeg gtg gtc etc act gaa tea ace 
Val Pro Pro Pro Arg Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr 
1085 1090 1095 



15975 



eta tct act gee ttg gcc gag etc gee ace aga age ttt ggc age tec 
Leu Ser Thr Ala Leu Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser 
1100 1105 1110 1115 



16023 



tea act tec gge att aeg ggc gae aat aeg aca aca tec tct gag cee 
Ser Thr Ser Gly He Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro 
1120 1125 1130 



16071 



gee cct tct ggc tge ccc ccc gac tec gac get gag tee tat tec tec 
Ala Pro Ser Gly Cys Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser 
1135 1140 1145 



16119 



atg ecc ccc ctg gag ggg gag cct ggg gat ccg gat ctt age gae ggg 16167 
Met Pro Pro Leu Glu Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly 
1150 1155 1160 

tea tgg tea aeg gtc agt agt gag gee aac gcg gag gat gtc gtg tge 16215 
Ser Trp Ser Thr Val Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys 
1165 1170 1175 
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tgc tea atg tct tac tct tgg aca ggc gca etc gtc acc ccg tgc gcc 
Cys Ser Met Ser Tyr Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala 
1180 1185 1190 1195 



16263 



gcg gaa gaa cag aaa ctg ccc ate aat gca eta age aac teg ttg eta 
Ala Glu Glu Gin Lys Leu Pro lie Asn Ala Leu Ser Asn Ser Leu Leu 
1200 1205 1210 



16311 



egt eae eac aat ttg gtg tat tec ace ace tea egc agt get tgc caa 
Arg His His Asn Leu Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin 
1215 1220 1225 



16359 



agg cag aag aaa gtc aca ttt gac aga ctg caa gtt ctg gac age cat 
Arg Gin Lys Lys Val Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His 
1230 1235 1240 



16407 



tac cag gac gta etc aag gag gtt aaa gea gcg gcg tea aaa gtg aag 
Tyr Gin Asp Val Leu Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys 
1245 1250 1255 



16455 



get aac ttg eta tec gta gag gaa get tgc age ctg acg cec cca eac 
Ala Asn Leu Leu Ser Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His 
1260 1265 1270 1275 



16503 



tea gcc aaa tec aag ttt ggt tat ggg gca aaa gac gtc egt tgc cat 
Ser Ala Lys Ser Lys Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His 
1280 1285 1290 



16551 



gcc aga aag gcc gta acc eac ate aac tee gtg tgg aaa gac ctt ctg 
Ala Arg Lys Ala Val Thr His lie Asn Ser Val Trp Lys Asp Leu Leu 
1295 1300 1305 



16599 



gaa gac aat gta aca cca ata gac act acc ate atg get aag aac gag 
Glu Asp Asn Val Thr Pro lie Asp Thr Thr lie Met Ala Lys Asn Glu 
1310 1315 1320 



16647 



gtt ttc tgc gtt cag cct gag aag ggg ggt egt aag cca get egt etc 
Val Phe Cys Val Gin Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu 
1325 1330 1335 



16695 



ate gtg ttc ccc gat ctg ggc gtg egc gtg tgc gaa aag atg get ttg 
lie: Val Phe Pro Asp Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu 
1340 1345 1350 1355 



16743 



tac gac gtg gtt aca aag etc ccc ttg gee gtg atg gga age tec tac 
Tyr Asp Val Val Thr Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr 
1360 1365 1370 



16791 



gga ttc caa tac tea cca gga cag egg gtt gaa ttc etc gtg caa gcg 
Gly Phe Gin Tyr Ser Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala 
1375 1380 1385 



16839 



tgg aag tec aag aaa acc cca atg ggg ttc teg tat gat acc egc tgc 
Trp Lys Ser Lys Lys Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys 
1390 1395 1400 



16887 



ttt gac tec aca gtc act gag age gac ate egt aeg gag gag gea ate 
Phe Asp Ser Thr Val Thr Glu Ser Asp He Arg Thr Glu Glu Ala lie 
1405 1410 1415 



16935 
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tac caa tgt tgt gac etc gac ccc caa gcc cgc gtg gcc ate aag tec 
Tyr Gin Cys Cys Asp Leu Asp Pro Gin Ala Arg Val Ala lie Lys Ser 
1420 1425 1430 1435 



16983 



etc ace gag agg ett tat gtt ggg ggc cct ctt ace aat tea agg ggg 
Leu Thr Glu Arg Leu Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly 
1440 1445 1450 



17031 



gag aac tgc ggc tat cgc agg tgc cgc gcg age ggc gta ctg aea act 
Glu Asn Cys Gly Tyr Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr 
1455 1460 1465 



17079 



age tgt ggt aac acc etc act tgc tac ate aag gee egg gea gcc tgt 
Ser Cys Gly Asn Thr Leu Thr Cys Tyr lie Lys Ala Arg Ala Ala Cys 
1470 1475 1480 



17127 



ega gcc gea ggg etc cag gac tgc ace atg etc gtg tgt ggc gac gac 
Arg Ala Ala Gly Leu Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp 
1485 1490 1495 



17175 



tta gtc gtt ate tgt gaa age gcg ggg gtc cag gag gac gcg geg age 
Leu Val Val He Cys Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser 
1500 1505 1510 1515 



17223 



ctg aga gee ttc acg gag get atg ace agg tac tec gcc ccc cct ggg 
Leu Arg Ala Phe Thr Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly 
1520 1525 1530 



17271 



gac ccc cea caa cca gaa tac gac ttg gag etc ata aca tea tgc tec 
Asp Pro Pro Gin Pro Glu Tyr Asp Leu Glu Leu He Thr Ser Cys Ser 
1535 1540 1545 



17319 



tec aac gtg tea gtc gcc cae gac ggc get gga aag agg gtc tac tac 
Ser Asn Val Ser Val Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr 
1550 1555 1560 



17367 



etc acc cgt gac eet aca acc ccc etc gcg aga get gcg tgg gag aea 
Leu Thr Arg Asp Pro Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr 
1565 1570 1575 



17415 



gea aga cae act cca gtc aat tec tgg eta ggc aac ata ate atg ttt 
Ala Arg His Thr Pro Val Asn Ser Trp Leu- Gly Asn He He Met Phe 
1580 1585 1590 1595 



17463 



gcc ccc aea ctg tgg gcg agg atg ata ctg atg acc cat ttc ttt age 
Ala Pro Thr Leu Trp Ala Arg Met He Leu Met Thr His Phe Phe Ser 
1600 1605 1610 



17511 



gtc ett ata gee agg gac cag ctt gaa cag gcc etc gat tgc gag ate 
Val Leu He Ala Arg Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu He 
1615 1620 1625 



17559 



tac ggg gee tgc tac tec ata gaa cca ctg gat eta eet cea ate att 
Tyr Gly Ala Cys Tyr Ser He Glu Pro Leu Asp Leu Pro Pro He He 
1630 1635 1640 



17607 



caa aga etc cat ggc etc age gea ttt tea etc cae agt tac tet cea 
Gin Arg Leu His Gly Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro 
1645 1650 1655 



17655 
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ggt gaa ate aat agg gtg gcc gca tgc etc aga aaa ctt ggg gta ccg 17703 
Gly Glu lie Asn Arg Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro 
1660 1665 1670 1675 

ccc ttg cga get tgg aga cac egg gee egg age gtc cgc get agg ctt 17751 
Pro Leu Arg Ala Trp Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu 
1680 1685 1690 

ctg gcc aga gga ggc agg get gcc ata tgt ggc aag tae etc ttc aac 17799 
Leu Ala Arg Gly Gly Arg Ala Ala lie Cys Gly Lys Tyr Leu Phe Asn 
1695 1700 1705 

tgg gca gta aga aca aag etc aaa etc act cca ata gcg gee get ggc 17847 
Trp Ala Val Arg Thr Lys Leu Lys Leu Thr Pro lie Ala Ala Ala Gly 
1710 1715 1720 

cag ctg gac ttg tec ggc tgg ttc acg get ggc tae age ggg gga gae 17895 
Gin Leu Asp Leu Ser Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp 
1725 1730 1735 

att tat cac age gtg tct cat gee egg ccc cgc tgg ate tgg ttt tgc 17943 
lie Tyr His Ser Val Ser His Ala Arg Pro Arg Trp lie Trp Phe Cys 
1740 1745 1750 1755 

eta etc ctg ctt get gca ggg gta ggc ate tae etc etc ccc aac cga 17991 
Leu Leu Leu Leu Ala Ala Gly Val Gly lie Tyr Leu Leu Pro Asn Arg 





1760 




1765 




1770 




tgaatagteg 


aetttgttcc 


cactgtactt 


ttagctcgta 


caaaatacaa 


tatactttte 


18051 


attteteegt 


aaacaacatg 


ttttcceatg 


taatatcctt 


ttctattttt 


cgttcegtta 


18111 


ccaaetttae 


acatacttta 


tatagctatt 


cacttetata 


eactaaaaaa 


etaagacaat 


18171 


tttaattttg 


ctgcctgcea 


tatttcaatt 


tgttataaat 


teetataatt 


tatcctatta 


18231 


gtagctaaaa 


aaagatgaat 


gtgaatcgaa 


tcetaagaga 


attggatctg 


atecacagga 


18291 


cgggtgtggt 


cgccatgatc 


gegtagtcga 


tagtggctcc 


aagtagcgaa 


gcgageagga 


18351 


ctgggeggcg 


gccaaagcgg 


teggacagtg 


cteegagaac 


gggtgcgeat 


agaaattgea 


18411 


tcaaegcata 


tagcgetage 


agcacgecat 


agtgaetgge 


gatgetgtcg 


gaatggaega 


18471 


tatcecgeaa 


gaggcccggc 


agtaceggea 


taaccaagee 


tatgcctaca 


geatccaggg 


18531 


tgacggtgce 


gaggatgacg 


atgagcgeat 


tgttagattt 


catacacggt 


gcctgactge 


18591 


gttagcaatt 


taactgtgat 


aaaetacege 


attaaagctt 


tttctttcea 


attttttttt 


18651 


tttegtcatt 


ataaaaatea 


ttacgaeega 


gatteeeggg 


taataactga 


tataattaaa 


18711 


ttgaagctct 


aatttgtgag 


tttagtatac 


atgcatttac 


ttataataca 


gttttttagt 


18771 


tttgetggce 


geatcttcte 


aaatatgett 


cceagcctge 


ttttetgtaa 


cgtteaccct 


18831 


etaccttage 


atcccttccc 


tttgcaaata 


gtcetcttee 


aacaataata 


atgtcagatc 


18891 


ctgtagagac 


cacatcatcc 


acggttctat 


actgttgaee 


eaatgcgtct 


cecttgtcat 


18951 
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ctaaacccac 


accgggtgtc 


ataatcaacc 


aatcgtaacc 


ttcatctctt 


ccacccatgt 


19011 


ctctttgagc 


aataaagccg 


ataacaaaat 


ctttgtcgct 


cttcgcaatg tcaacagtac 


19071 


ccttagtata 


ttctccagta 


gatagggagc 


ccttgcatga 


caattctgct 


aacatcaaaa 


19131 


ggcctctagg 


ttcctttgtt 


acttcttctg 


ccgcctgctt 


caaaccgcta 


acaatacctg 


19191 


ggcccaccac 


accgtgtgca 


ttcgtaatgt 


ctgcccattc 


tgctattctg 


tatacacccg 


19251 


cagagtactg 


caatttgact 


gtattaccaa 


tgtcagcaaa 


ttttctgtct 


tcgaagagta 


19311 


aaaaattgta. 


cttggcggat 


aatgccttta gcggcttaac 


tgtgccctcc 


atggaaaaat 


19371 


cagtcaagat 


atccacatgt 


gtttttagta 


aacaaatttt 


gggacctaat 


gcttcaacta 


19431 


actccagtaa 


ttccttggtg 


gtacgaacat 


ccaatgaagc 


acacaagttt 


gtttgctttt 


19491 


cgtgcatgat 


attaaatagc 


ttggcagcaa 


caggactagg 


atgagtagca 


gcacgttcct 


19551 


tatatgtagc 


tttcgacatg 


atttatcttc 


gtttcctgca 


ggtttttgtt 


ctgtgcagtt 


19611 


999ttaagaa 


tactgggcaa 


tttcatgttt 


cttcaacact 


acatatgcgt 


atatatacca 


19671 


atctaagtct 


gtgctccttc 


cttcgttctt 


ccttctgttc 


ggagattacc 


gaatcaaaaa 


19731 


aatttcaagg 


aaaccgaaat 


caaaaaaaag 


aataaaaaaa 


aaatgatgaa 


ttgaaaagct 


19791 


tatcgat 












19798 



<210> 11 
<211> 1771 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 
pd , deltaNS3NS5 .pj 

<400> 11 

Met Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val Leu Val Leu Asn Pro 
15 10 15 

Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr Met Ser Lys Ala His 
20 25 30 

Gly He Asp Pro Asn He Arg Thr Gly Val Arg Thr He Thr Thr Gly 
35 40 45 

Ser Pro He Thr Tyr Ser Thr Tyr Gly Lys Phe Leu Ala Asp Gly Gly 
50 55 60 

Cys Ser Gly Gly Ala Tyr Asp He He He Cys Asp Glu Cys His Ser 
65 70 75 80 

Thr Asp Ala Thr Ser He Leu Gly He Gly Thr Val Leu Asp Gin Ala 
85 90 95 
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Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala Thr Ala Thr Pro Pro 
100 105 110 

Gly Ser Val Thr Val Pro His Pro Asn lie Glu Glu Val Ala Leu Ser 
115 120 125 

Thr Thr Gly Glu lie Pro Phe Tyr Gly Lys Ala He Pro Leu Glu Val 
130 135 140 

He Lys Gly Gly Arg His Leu He Phe Cys His Ser Lys Lys Lys Cys 
145 150 155 160 

Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly lie Asn Ala Val Ala 
165 170 175 

Tyr Tyr Arg Gly Leu Asp Val Ser Val He Pro Thr Ser Gly Asp Val 
180 185 190 

Val Val Val Ala Thr Asp Ala Leu Met Thr Gly Tyr Thr Gly Asp Phe 
195 200 205 

Asp Ser Val He Asp Cys Asn Thr Cys Val Thr Gin Thr Val Asp Phe 
210 215 220 

Ser Leu Asp Pro Thr Phe Thr He Glu Thr He Thr Leu Pro Gin Asp 
225 230 235 240 

Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr Gly Arg Gly Lys Pro 
245 250 255 

Gly He Tyr Arg Phe Val Ala Pro Gly Glu Arg Pro Ser Gly Met Phe 
260 265 270 

Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala Gly Cys Ala Trp Tyr 
275 280 285 

Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu Arg Ala Tyr Met Asn 
290 295 300 

Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu Glu Phe Trp Glu Gly 
305 310 315 320 

Val Phe Thr Gly Leu Thr His He Asp Ala His Phe Leu Ser Gin Thr 
325 330 335 

Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val Ala Tyr Gin Ala Thr 
340 345 350 

Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser Trp Asp Gin Met Trp 
355 360 365 

Lys Cys Leu He Arg Leu Lys Pro Thr Leu His Gly Pro Thr Pro Leu 
370 375 380 

Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu He Thr Leu Thr His Pro 
385 390 395 400 

Val Thr Lys Tyr He Met Thr Cys Met Ser Ala Asp Leu Glu Val Val 
405 410 415 
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Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu Ala Ala Leu Ala Ala 
420 425 430 

Tyr Cys Leu Ser Thr Gly Cys Val Val He Val Gly Arg Val Val Leu 
435 440 445 

Ser Gly Lys Pro Ala He He Pro Asp Arg Glu Val Leu Tyr Arg Glu 
450 455 460 

Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu Pro Tyr He Glu Gin 
465 470 475 480 

Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys Ala Leu Gly Leu Leu 
485 490 495 

Gin Thr Ala Ser Arg Gin Ala Glu Val He Ala Pro Ala Val Gin Thr 
500 505 510 

Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys His Met Trp Asn Phe 
515 520 525 

He Ser Gly He Gin Tyr Leu Ala Gly Leu Ser Thr Leu Pro Gly Asn 
530 535 540 

Pro Ala He Ala Ser Leu Met Ala Phe Thr Ala Ala Val Thr Ser Pro 
545 550 555 560 

Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn He Leu Gly Gly Trp Val 
565 570 575 

Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr Ala Phe Val Gly Ala 
580 585 590 

Gly Leu Ala Gly Ala Ala He Gly Ser Val Gly Leu Gly Lys Val Leu 
595 600 605 

He Asp He Leu Ala Gly Tyr Gly Ala Gly Val Ala Gly Ala Leu Val 
610 615 620 

Ala Phe Lys He Met Ser Gly Glu Val Pro Ser Thr Glu Asp Leu Val 
625 630 635 640 

Asn Leu Leu Pro Ala He Leu Ser Pro Gly Ala Leu Val Val Gly Val 
645 650 655 

Val Cys Ala Ala He Leu Arg Arg His Val Gly Pro Gly Glu Gly Ala 
660 665 670 

Val Gin Trp Met Asn Arg Leu He Ala Phe Ala Ser Arg Gly Asn His 
675 680 685 

Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp Ala Ala Ala Arg Val 
690 695 700 

Thr Ala He Leu Ser Ser Leu Thr Val Thr Gin Leu Leu Arg Arg Leu 
705 710 715 720 

His Gin Trp He Ser Ser Glu Cys Thr Thr Pro Cys Ser Gly Ser Trp 
725 730 735 
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Leu Arg Asp II Trp Asp Trp He Cys Glu Val Leu Ser Asp Phe Lys 
740 745 750 

Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu Pro Gly He Pro Phe 
755 760 765 

Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp Arg Gly Asp Gly He 
770 775 780 

Met His Thr Arg Cys His Cys Gly Ala Glu He Thr Gly His Val Lys 
785 790 795 800 

Asn Gly Thr Met Arg He Val Gly Pro Arg Thr Cys Arg Asn Met Trp 
805 810 815 

Ser Gly Thr Phe Pro He Asn Ala Tyr Thr Thr Gly Pro Cys Thr Pro 
820 825 830 

Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp Arg Val Ser Ala Glu 
835 840 845 

Glu Tyr Val Glu He Arg Gin Val Gly Asp Phe His Tyr Val Thr Gly 
850 855 860 

Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin Val Pro Ser Pro Glu 
865 870 875 880 

Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His Arg Phe Ala Pro Pro 
885 890 895 

Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe Arg Val Gly Leu His 
900 905 910 

Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu Pro Glu Pro Asp Val 
915 920 925 

Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser His He Thr Ala Glu 
930 935 940 

Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro Pro Ser Val Ala Ser 
945 950 955 960 

Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu Lys Ala Thr Cys Thr 
965 970 975 

Ala Asn His Asp Ser Pro Asp Ala Glu Leu He Glu Ala Asn Leu Leu 
980 985 990 

Trp Arg Gin Glu Met Gly Gly Asn He Thr Arg Val Glu Ser Glu Asn 
995 1000 1005 

Lys Val Val He Leu Asp Ser Phe Asp Pro Leu val Ala Glu Glu Asp 
1010 1015 1020 

Glu Arg Glu He Ser Val Pro Ala Glu He Leu Arg Lys Ser Arg Arg 
025 1030 1035 1040 

Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro Asp Tyr Asn Pro Pro 
1045 1050 1055 
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Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu Pro Pro Val Val His 
1060 1065 1070 

Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro Val Pro Pro Pro Arg 
1075 1080 1085 

Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr Leu Ser Thr Ala Leu 
1090 1095 1100 

Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser Ser Thr Ser Gly lie 
105 1110 1115 1120 

Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro Ala Pro Ser Gly Cys 
1125 1130 1135 

Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser Met Pro Pro Leu Glu 
1140 1145 1150 

Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly Ser Trp Ser Thr Val 
1155 1160 1165 

Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys Cys Ser Met Ser Tyr 
1170 1175 1180 

Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala Ala Glu Glu Gin Lys 
185 1190 1195 1200 

Leu Pro lie Asn Ala Leu Ser Asn Ser Leu Leu Arg His His Asn Leu 
1205 1210 1215 

Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin Arg Gin Lys Lys Val 
1220 1225 1230 

Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His Tyr Gin Asp Val Leu 
1235 1240 1245 

Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys Ala Asn Leu Leu Ser 
1250 1255 1260 

Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His Ser Ala Lys Ser Lys 
265 1270 1275 1280 

Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His Ala Arg Lys Ala Val 
1285 1290 1295 

Thr His He Asn Ser Val Trp Lys Asp Leu Leu Glu Asp Asn Val Thr 
1300 1305 1310 

Pro He Asp Thr Thr He Met Ala Lys Asn Glu Val Phe Cys Val Gin 
1315 1320 1325 

Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu He Val Phe Pro Asp 
1330 1335 1340 

Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu Tyr Asp Val Val Thr 
345 1350 1355 1360 

Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr Gly Phe Gin Tyr Ser 
1365 1370 1375 
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Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala Trp Lys Ser Lys Lys 
1380 1385 1390 

Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys Phe Asp Ser Thr Val 
1395 1400 1405 

Thr Glu Ser Asp lie Arg Thr Glu Glu Ala lie Tyr Gin Cys Cys Asp 
1410 1415 1420 

Leu Asp Pro Gin Ala Arg Val Ala lie Lys Ser Leu Thr Glu Arg Leu 
425 1430 1435 1440 

Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly Glu Asn Cys Gly Tyr 
1445 1450 1455 

Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr Ser Cys Gly Asn Thr 
1460 1465 1470 

Leu Thr Cys Tyr lie Lys Ala Arg Ala Ala Cys Arg Ala Ala Gly Leu 
1475 1480 1485 

Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp Leu Val Val lie Cys 
1490 1495 1500 

Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser Leu Arg Ala Phe Thr 
505 1510 1515 1520 

Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly Asp Pro Pro Gin Pro 
1525 1530 1535 

Glu Tyr Asp Leu Glu Leu lie Thr Ser Cys Ser Ser Asn Val Ser Val 
1540 1545 1550 

Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr Leu Thr Arg Asp Pro 
1555 1560 1565 

Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr Ala Arg His Thr Pro 
1570 1575 1580 

Val Asn Ser Trp Leu Gly Asn lie lie Met Phe Ala Pro Thr Leu Trp 
585 1590 1595 1600 

Ala Arg Met lie Leu Met Thr His Phe Phe Ser Val Leu lie Ala Arg 
1605 1610 1615 

Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu lie Tyr Gly Ala Cys Tyr 
1620 1625 1630 

Ser lie Glu Pro Leu Asp Leu Pro Pro lie lie Gin Arg Leu His Gly 
1635 1640 1645 

Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro Gly Glu lie Asn Arg 
1650 1655 1660 

Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro Pro Leu Arg Ala Trp 
665 1670 1675 1680 

Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu Leu Ala Arg Gly Gly 
1685 1690 1695 
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Arg Ala Ala lie Cys Gly Lys Tyr Leu Phe Asn Trp Ala Val Arg Thr 
1700 1705 1710 

Lys Leu Lys Leu Thr Pro lie Ala Ala Ala Gly Gin Leu Asp Leu Ser 
1715 1720 1725 

Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp lie Tyr His Ser Val 
1730 1735 1740 

Ser His Ala Arg Pro Arg Trp lie Trp Phe Cys Leu Leu Leu Leu Ala 
745 1750 1755 1760 

Ala Gly Val Gly lie Tyr Leu Leu Pro Asn Arg 
1765 1770 



<210> 12 
<211> 20220 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 
pd. delta. NS3NS5.pj .corel21 

<220> 
<221> CDS 

<222> (12679) . . (18354) 



<400> 12 
atcgatccta 


ccccttgcgc 


taaagaagta 


tatgtgccta 


ctaacgcttg 


tctttgtctc 


60 


tgtcactaaa 


cactggatta 


ttactcccag 


atacttattt 


tggactaatt 


taaatgattt 


120 


cggatcaacg 


ttcttaatat 


cgctgaatct 


tccacaattg atgaaagtag 


ctaggaagag 


180 


gaattggtat 


aaagtttttg 


tttttgtaaa 


tctcgaagta 


tactcaaacg 


aatttagtat 


240 


tttctcagtg 


atctcccaga 


tgctttcacc 


ctcacttaga 


agtgctttaa 


gcattttttt 


300 


actgtggcta 


tttcccttat 


ctgcttcttc 


cgatgattcg 


aactgtaatt 


gcaaactact 


360 


tacaatatca 


gtgatatcag 


attgatgttt 


ttgtccatag 


taaggaataa 


ttgtaaattc 


420 


ccaagcagga 


atcaatttct 


ttaatgaggc 


ttccagaatt 


gttgcttttt 


gcgtcttgta 


480 


tttaaactgg 


agtgatttat 


tgacaatatc 


gaaactcagc 


gaattgctta 


tgatagtatt 


540 


atagctcatg 


aatgtggctc 


tcttgattgc 


tgttccgtta 


tgtgtaatca 


tccaacataa 


600 


ataggttagt 


tcagcagcac 


ataatgctat 


tttctcacct 


gaaggtcttt 


caaacctttc 


660 


cacaaactga 


cgaacaagca 


ccttaggtgg 


tgttttacat 


aatatatcaa 


attgtggcat 


720 


gcttagcgcc 


gatcttgtgt 


gcaattgata 


tctagtttca 


actactctat 


ttatcttgta 


780 


tcttgcagta 


ttcaaacacg 


ctaactcgaa 


aaactaactt 


taattgtcct 


gtttgtctcg 


840 
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cgttctttcg aaaaatgcac cggccgcgca ttatttgtac tgcgaaaata attggtactg 900 
cggtatcttc atttcatatt ttaaaaatgc acctttgctg cttttcctta atttttagac 960 
ggcccgcagg ttcgttttgc ggtactatct tgtgataaaa agttgttttg acatgtgatc 1020 
tgcacagatt ttataatgta ataagcaaga atacattatc aaacgaacaa tactggtaaa 1080 
agaaaaccaa aatggacgac attgaaacag ccaagaatct gacggtaaaa gcacgtacag 1140 
cttatagcgt ctgggatgta tgtcggctgt ttattgaaat gattgctcct gatgtagata 1200 
ttgatataga gagtaaacgt aagtctgatg agctactctt tccaggatat gtcataaggc 1260 
ccatggaatc tctcacaacc ggtaggccgt atggtcttga ttctagcgca gaagattcca 1320 
gcgtatcttc tgactccagt gctgaggtaa ttttgcctgc tgcgaagatg gttaaggaaa 13 80 
ggtttgattc gattggaaat ggtatgctct cttcacaaga agcaagtcag gctgccatag 1440 
atttgatgct acagaataac aagctgttag acaatagaaa gcaactatac aaatctattg 1500 
ctataataat aggaagattg cccgagaaag acaagaagag agctaccgaa atgctcatga 1560 
gaaaaatgga ttgtacacag ttattagtcc caccagctcc aacggaagaa gatgttatga 1620 
agctcgtaag cgtcgttacc caattgctta ctttagttcc accagatcgt caagctgctt 1680 
taataggtga tttattcatc ccggaatctc taaaggatat attcaatagt ttcaatgaac 1740 
tggcggcaga gaatcgttta cagcaaaaaa agagtgagtt ggaaggaagg actgaagtga 1800 
accatgctaa tacaaatgaa gaagttccct ccaggcgaac aagaagtaga gacacaaatg 1860 
caagaggagc atataaatta caaaacacca tcactgaggg ccctaaagcg gttcccacga 1920 
aaaaaaggag agtagcaacg agggtaaggg gcagaaaatc acgtaatact tctagggtat 1980 
gatccaatat caaaggaaat gatagcattg aaggatgaga ctaatccaat tgaggagtgg 2040 
cagcatatag aacagctaaa gggtagtgct gaaggaagca tacgataccc cgcatggaat 2100 
gggataatat cacaggaggt actagactac ctttcatcct acataaatag acgcatataa 2160 
gtacgcattt aagcataaac acgcactatg ccgttcttct catgtatata tatatacagg 2220 
caacacgcag atataggtgc gacgtgaaca gtgagctgta tgtgcgcagc tcgcgttgca 2280 
ttttcggaag cgctcgtttt cggaaacgct ttgaagttcc tattccgaag ttcctattct 2340 
ctagaaagta taggaacttc agagcgcttt tgaaaaccaa aagcgctctg aagacgcact 2400 
ttcaaaaaac caaaaacgca ccggactgta acgagctact aaaatattgc gaataccgct 2460 
tccacaaaca ttgctcaaaa gtatctcttt gctatatatc tctgtgctat atccctatat 2520 
aacctaccca tccacctttc gctccttgaa cttgcatcta aactcgacct ctacatcaac 2580 
aggcttccaa tgctcttcaa attttactgt caagtagacc catacggctg taatatgctg 2640 
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ctcttcataa tgtaagctta tctttatcga 
ctttacggtt ccctgagatt gaattagttc 
ctttgtacga cgaattttga ggttcgccat 
tattatctcc gcctcagttt gatcttccgc 
tatttcaccc cacaatcctt catccgcctc 
atgttgtaca ttgtttagtt cacgagaagg 
tatatgacct ttatcctgtt ctctttccac 
gcacctaata acattcttca aggcggagaa 
tgaaaacgtg agaatgaatt tagtattatt 
tcgaagataa gagaagaatg cagtgacctt 
aaaaaatacg cctttaggcc ttctgatacc 
attaatatct aaaccctctc cgatggtggc 
aaactgtgat aattctgggt gatttatgat 
aggatcaggc caatccagtt ctttttcaat 
tccaacaaat gcaaatgcta acgttttgta 
cccccttgtc gtctcgatta cacacctact 
cataatacat tgcttaatac aagcaagcag 
cattacagct gatgtcattg tatatcagcg 
tcgcggtttt tataaacaaa actttcgtta 
ttggaaattc gggaaaaagt agagcaacgc 

ttaacttcga gaagggatta aggctaattt 

t ■ ■ 

ccattgaatg ccttataaaa cagctataga 
tttgtcaaag cttactgatg atgatgtgtc 
tgacattata aagctggcac ttagaattcc 
tctactgtac gatacacttc cgctcaggtc 
ttgttactct attgatccag ctcagcaaag 
tgtagtaaaa ctagctagac cgagaaagag 
gctgccatca ttattatccg atgtgacgct 
tttttttttt tttttttttt ttttttggta 
agcaaggatt ttcttaactt cttcggcgac 



atcgtgtgaa aaactactac cgcgataaac 2700 
ctttagtata tgatacaaga cacttttgaa 2760 
cctctggcta tttccaatta tcctgtcggc 2820 
ttcagactgc catttttcac ataatgaatc 2880 
cgcatcttgt tccgttaaac tattgacttc 2940 
gtcctcttca ggcggtagct cctgatctcc 3000 
aaacttagaa atgtattcat gaattatgga 3060 
gtttgggcca gatgcccaat atgcttgaca 3120 
gtgatattct gaggcaattt tattataatc 3180 
tgtattgaca aatggagatt ccatgtatct 3240 
ctttcccctg cggtttagcg tgccttttac 3300 
ctttaactga ctaataaatg caaccgatat 3360 
tcgatcgaca attgtattgt acactagtgc 3420 
taccggtgtg tcgtctgtat tcagtacatg 34 80 
tttcttataa ttgtcaggaa ctggaaaagt 3540 
ttcatcgtac accataggtt ggaagtgctg 3600 
tctctcgcca ttcatatttc agttattttc 3660 
ctgtaaaaat ctatctgtta cagaaggttt 3720 
cgaaatcgag caatcacccc agctgcgtat 3780 
gagttgcatt ttttacacca taatgcatga 3840 
cactagtatg tttcaaaaac ctcaatctgt 3900 
ttgcatagaa gagttagcta ctcaatgctt 3960 
tactttcagg cgggtctgta gtaaggagaa 4020 
acggactata gactatacta gtatactccg 4080 
cttgtccttt aacgaggcct taccactctt 4140 
gcagtgtgat ctaagattct atcttcgcga 4200 
actagaaatg caaaaggcac ttctacaatg 4260 
gcattttttt tttttttttt tttttttttt 4320 
caaatatcat aaaaaaagag aatcttttta 4380 
agcatcaccg acttcggtgg tactgttgga 4440 
88 



wo 01/38360 



PCTAJSOO/32326 



accacctaaa tcaccagttc tgatacctgc 
ggctttacct tcttcaggca agttcaatga 
agtggcgata gggttgacct tattctttgg 
gtacaaacca aatgcggtgt tcttgtctgg 
acccaaggag cctgggataa cggaggcttc 
ggtgattata ataccattta ggtgggttgg 
aatcaattga tgttgaactt tcaatgtagg 
ttttctccat aatcttgaag aggccaaaac 
tggtggctca tgttgtaggg ccatgaaagc 
aacggtgtat tgttcactat cccaagcgac 
aaagtaaata cctcccacta attctctaac 
tggcttgatt ggagataagt ctaaaagaga 
ggcgtacaat tgaagttctt tacggatttt 
ggtaccccat ttaggaccac ccacagcacc 
ttccagcgcc tcatctggaa gtggaacacc 
atgattttcg aaatcgaact tgacattgga 
aatggcttcg gctgtgattt cttgaccaac 
^ggggcagac attacaatgg tatatccttg 
aaaaaaaaaa atgcagcttc tcaatgatat 
tatccgacaa actgttttac agatttacga 
acatccgaac ctgggagttt tccctgaaac 
tatagtctag cgctttacgg aagacaatgt 
atctattgca taggtaatct tgcacgtcgc 
tgcacttcaa tagcatatct ttgttaacga 
atgcaacgcg agagcgctaa tttttcaaac 
gaaatgcaac gcgaaagcgc tattttacca 
caaaaatgca acgcgagagc gctaattttt 
gaacagaaat gcaacgcgag agcgctattt 
ttctacaaaa atgcatcccg agagcgctat 
tttctccttt gtgcgctcta taatgcagtc 



atccaaaacc tttttaactg catcttcaat 4500 
caatttcaac atcattgcag cagacaagat 4560 
caaatctgga gcggaaccat ggcatggttc 4620 
caaagaggcc aaggacgcag atggcaacaa 4680 
atcggagatg atatcaccaa acatgttgct 4740 
gttcttaact aggatcatgg cggcagaatc 4800 
gaattcgttc ttgatggttt cctccacagt 4 860 
attagcttta tccaaggacc aaataggcaa 4920 
ggccattctt gtgattcttt gcacttctgg 4980 
accatcacca tcgtcttcct ttctcttacc 5040 
aacaacgaag tcagtacctt tagcaaattg 5100 
gtcggatgca aagttacatg gtcttaagtt 5160 
tagtaaacct tgttcaggtc taacactacc 5220 
taacaaaacg gcatcagcct tcttggaggc 5280 
tgtagcatcg atagcagcac caccaattaa 5340 
acgaacatca gaaatagctt taagaacctt 5400 
gtggtcacct ggcaaaacga cgatcttctt 5460 
aaatatatat aaaaaaaaaa aaaaaaaaaa 5520 
tcgaatacgc tttgaggaga tacagcctaa 558 0 
tcgtacttgt tacccatcat tgaattttga 5640 
agatagtata tttgaacctg tataataata 5700 
atgtatttcg gttcctggag aaactattgc 5760 
atccccggtt cattttctgc gtttccatct 5820 
agcatctgtg cttcattttg tagaacaaaa 5880 
aaagaatctg agctgcattt ttacagaaca 5940 
acgaagaatc tgtgcttcat ttttgtaaaa 6000 
caaacaaaga atctgagctg catttttaca 6060 
taccaacaaa gaatctatac ttcttttttg 6120 
ttttctaaca aagcatctta gattactttt 6180 
tcttgataac tttttgcact gtaggtccgt 6240 
89 



wo 01/38360 



PCT/USOO/32326 



taaggttaga agaaggctac tttggtgtct 
cacttcccgc gtttactgat tactagcgaa 
atccccgatt atattctata ccgatgtgga 
gcgttgatga ttcttcattg gtcagaaaat 
atactacgta taggaaatgt ttacattttc 
tcttactaca atttttttgt ctaaagagta 
gtcgagttta gatgcaagtt caaggagcga 
agcacagaga tatatagcaa agagatactt 
aatattttag tagctcgtta cagtccggtg 
gagcgctttt ggttttcaaa agcgctctga 
tcggaatagg aacttcaaag cgtttccgaa 
tgcgcacata cagctcactg ttcacgtcgc 
tatacatgag aagaacggca tagtgcgtgt 
atttatgtag gatgaaaggt agtctagtac 
gtatcgtatg cttccttcag cactaccctt 
tggattagtc tcatccttca atgctatcat 
ccgagaaact agtgcgaagt agtgatcagg 
cctggccacg gcagaagcac gcttatcgct 
taggcccttc attgaaagaa atgaggtcat 
attttttata gcaaagattg aataaggcgc 
gactaagtta tcttttaata attggtattc 
atttactcgt tttaggactg gttcagaatt 
atcgatgata agctgtcaaa catgagaatt 
tatttttata ggttaatgtc atgataataa 
ggggaaatgt gcgcggaacc cctatttgtt 
cgctcatgag acaataaccc tgataaatgc 
gtattcaaca tttccgtgtc gcccttattc 
ttgctcaccc agaaacgctg gtgaaagtaa 
tgggttacat cgaactggat ctcaacagcg 
aacgttttcc aatgatgagc acttttaaag 



attttctctt ccataaaaaa agcctgactc 6300 
gctgcgggtg cattttttca agataaaggc 6360 
ttgcgcatac tttgtgaaca gaaagtgata 6420 
tatgaacggt ttcttctatt ttgtctctat 6480 
gtattgtttt cgattcactc tatgaatagt 6540 
atactagaga taaacataaa aaatgtagag 6600 
aaggtggatg ggtaggttat atagggatat 6660 
ttgagcaatg tttgtggaag cggtattcgc 6720 
cgtttttggt tttttgaaag tgcgtcttca 6780 
agttcctata ctttctagag aataggaact 6840 
aacgagcgct tccgaaaatg caacgcgagc 6900 
acctatatct gcgtgttgcc tgtatatata 6960 
ttatgcttaa atgcgtactt atatgcgtct 7020 
ctcctgtgat attatcccat tccatgcggg 7080 
tagctgttct atatgctgcc actcctcaat 7140 
ttcctttgat attggatcat atgcatagta 7200 
tattgctgtt atctgatgag tatacgttgt 7260 
ccaatttccc acaacattag tcaactccgt 7320 
caaatgtctt ccaatgtgag attttgggcc 7380 
atttttcttc aaagctttat tgtacgatct 7440 
ctgtttattg cttgaagaat tgccggtcct 7500 
cctcaaaaat tcatccaaat atacaagtgg 7560 
cttgaagacg aaagggcctc gtgatacgcc 7620 
tggtttctta gacgtcaggt ggcacttttc 7680 
tatttttcta aatacattca aatatgtatc 7740 
ttcaataata ttgaaaaagg aagagtatga 7800 
ccttttttgc ggcattttgc cttcctgttt 7860 
aagatgctga agatcagttg ggtgcacgag 7920 
gtaagatcct tgagagtttt cgccccgaag 7980 
ttctgctatg tggcgcggta ttatcccgtg 8040 
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ttgacgccgg gcaagagcaa ctcggtcgcc gcatacacta ttctcagaat gacttggttg 8100 
agtactcacc agtcacagaa aagcatctta cggatggcat gacagtaaga gaattatgca 8160 
gtgctgccat aaccatgagt gataacactg cggccaactt acttctgaca acgatcggag 8220 
gaccgaagga gctaaccgct tttttgcaca acatggggga tcatgtaact cgccttgatc 8280 
gttgggaacc ggagctgaat gaagccatac caaacgacga gcgtgacacc acgatgcctg 8340 
cagcaatggc aacaacgttg cgcaaactat taactggcga actacttact ctagcttccc 8400 
ggcaacaatt aatagactgg atggaggcgg ataaagttgc aggaccactt ctgcgctcgg 8460 
cccttccggc tggctggttt attgctgata aatctggagc cggtgagcgt gggtctcgcg 8520 
gtatcattgc agcactgggg ccagatggta agccctcccg tatcgtagtt atctacacga 8580 
C9g99agtca ggcaactatg gatgaacgaa atagacagat cgctgagata ggtgcctcac 8640 
tgattaagca ttggtaactg tcagaccaag tttactcata tatactttag attgatttaa 8700 
aacttcattt ttaatttaaa aggatctagg tgaagatcct ttttgataat ctcatgacca 8760 
aaatccctta acgtgagttt tcgttccact gagcgtcaga ccccgtagaa aagatcaaag 8820 
gatcttcttg agatcctttt tttctgcgcg taatctgctg cttgcaaaca aaaaaaccac 8880 
cgctaccagc ggtggtttgt ttgccggatc aagagctacc aactcttttt ccgaaggtaa 8940 
ctggcttcag cagagcgcag ataccaaata ctgtccttct agtgtagccg tagttaggcc 9000 
accacttcaa gaactctgta gcaccgccta catacctcgc tctgctaatc ctgttaccag 9060 
tggctgctgc cagtggcgat aagtcgtgtc ttaccgggtt ggactcaaga cgatagttac 912 0 
cggataaggc gcagcggtcg ggctgaacgg ggggttcgtg cacacagccc agcttggagc 9180 
gaacgaccta caccgaactg agatacctac agcgtgagct atgagaaagc gccacgcttc 924 0 
ccgaagggag aaaggcggac aggtatccgg taagcggcag ggtcggaaca ggagagcgca 9300 
cgagggagct tccaggggga aacgcctggt atctttatag tcctgtcggg tttcgccacc 9360 
tctgacttga gcgtcgattt ttgtgatgct cgtcaggggg gcggagccta tggaaaaacg 942 0 
ccagcaacgc ggccttttta cggttcctgg ccttttgctg gccttttgct cacatgttct 9480 
ttcctgcgtt atcccctgat tctgtggata accgtattac cgcctttgag tgagctgata 954 0 
ccgctcgccg cagccgaacg accgagcgca gcgagtcagt gagcgaggaa gcggaagagc 9600 
gcctgatgcg gtattttctc cttacgcatc tgtgcggtat ttcacaccgc atatggtgca 9660 
ctctcagtac aatctgctct gatgccgcat agttaagcca gtatacactc cgctatcgct 9720 
acgtgactgg gtcatggctg cgccccgaca cccgccaaca cccgctgacg cgccctgacg 9780 
ggcttgtctg ctcccggcat ccgcttacag acaagctgtg accgtctccg ggagctgcat 9840 
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gtgtcagagg 


ttttcaccgt 


catcaccgaa 


acgcgcgagg 


cagctgcggt 


aaagctcatc 


9900 


agcgtggtcg 


tgaagcgatt 


cacagatgtc 


tgcctgttca 


tccgcgtcca 


gctcgttgag 


9960 


tttctccaga 


agcgttaatg 


tctggcttct 


gataaagcgg 


gccatgttaa 


gggcggtttt 


10020 


ttcctgtttg 


gtcactgatg 


cctccgtgta 


agggggattt 


ctgttcatgg 


gggtaatgat 


10080 


accgatgaaa 


cgagagagga 


tgctcacgat 


acgggttact 


gatgatgaac 


atgcccggtt 


10140 


actggaacgt 


tgtgagggta 


aacaactggc 


ggtatggatg 


cggcgggacc 


agagaaaaat 


10200 


cactcagggt 


caatgccagc 


gcttcgttaa 


tacagatgta 


ggtgttccac 


agggtagcca 


10260 


gcagcatcct 


gcgatgcaga 


tccggaacat 


aatggtgcag 


ggcgctgact 


tccgcgtttc 


10320 


cagactttac 


gaaacacgga 


aaccgaagac 


cattcatgtt 


gttgctcagg 


tcgcagacgt 


10380 


tttgcagcag 


cagtcgcttc 


acgttcgctc 


gcgtatcggt 


gattcattct 


gctaaccagt 


10440 


aaggcaaccc 


cgccagccta 


gccgggtcct 


caacgacagg 


agcacgatca 


tgcgcacccg 


10500 


tggccaggac 


ccaacgctgc 


ccgagatgcg 


ccgcgtgcgg 


ctgctggaga 


tggcggacgc 


10560 


gatggatatg 


ttctgccaag 


ggttggtttg 


cgcattcaca 


gttctccgca 


agaattgatt 


10620 


ggctccaatt 


cttggagtgg 


tgaatccgtt 


agcgaggtgc 


cgccggcttc 


cattcaggtc 


10680 


gaggtggccc 


ggctccatgc 


accgcgacgc 


aacgcgggga 


ggcagacaag 


gtatagggcg 


10740 


gcgcctacaa 


tccatgccaa 


cccgttccat 


gtgctcgccg 


aggcggcata 


aatcgccgtg 


10800 


acgatcagcg 


gtccaatgat 


cgaagttagg 


ctggtaagag 


ccgcgagcga 


tccttgaagc 


10860 


tgtccctgat 


ggtcgtcatc 


tacctgcctg 


gacagcatgg 


cctgcaacgc 


gggcatcccg 


10920 


atgccgccgg 


aagcgagaag 


aatcataatg 


9ggaaggcca 


tccagcctcg 


cgtcgcgaac 


10980 


gccagcaaga 


cgtagcccag 


cgcgtcggcc 


gccatgccgg 


cgataatggc 


ctgcttctcg 


11040 


ccgaaacgtt 


tggtggcggg 


accagtgacg 


aaggcttgag 


cgagggcgtg 


caagattccg 


11100 


aataccgcaa 


gcgacaggcc 


gatcatcgtc 


gcgctccagc 


gaaagcggtc 


ctcgccgaaa 


11160 


atgacccaga 


gcgctgccgg 


cacctgtcct 


acgagttgca 


tgataaagaa 


gacagtcata 


11220 


agtgcggcga 


cgatagtcat 


gccccgcgcc 


caccggaagg 


agctgactgg 


gttgaaggct 


11280 


ctcaagggca 


tcggtcgagg 


atccttcaat 


atgcgcacat 


acgctgttat 


gttcaaggtc 


11340 


ccttcgttta 


agaacgaaag 


cggtcttcct 


tttgagggat 


gtttcaagtt 


gttcaaatct 


11400 


atcaaatttg 


caaatcccca 


gtctgtatct 


agagcgttga 


atcggtgatg 


cgatttgtta 


11460 


attaaattga 


tggtgtcacc 


attaccaggt 


ctagatatac 


caatggcaaa 


ctgagcacaa 


11520 


caataccagt 


ccggatcaac 


tggcaccatc 


tctcccgtag 


tctcatctaa 


tttttcttcc 


11580 


ggatgaggtt 


ccagatatac 


cgcaacacct 


ttattatggt 


ttccctgagg 


gaataataga 


11640 
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atgtcccatt cgaaatcacc aattctaaac ctgggcgaat tgtatttcgg gtttgttaac 11700 

tcgttccagt caggaatgtt ccacgtgaag ctatcttcca gcaaagtctc cacttcttca 11760 

tcaaattgtg gagaatactc ccaatgctct tatctatggg acttccggga aacacagtac 11820 

cgatacttcc caattcgtct tcagagctca ttgtttgttt gaagagacta atcaaagaat 11880 

cgttttctca aaaaaattaa tatcttaact gatagtttga tcaaaggggc aaaacgtagg 11940 

ggcaaacaaa cggaaaaatc gtttctcaaa ttttctgatg ccaagaactc taaccagtct 1200 0 

tatctaaaaa ttgccttatg atccgtctct ccggttacag cctgtgtaac tgattaatcc 12060 

tgcctttcta atcaccattc taatgtttta attaagggat tttgtcttca ttaacggctt 12X20 

tcgctcataa aaatgttatg acgttttgcc cgcaggcggg aaaccatcca cttcacgaga 12180 

ctgatctcct ctgccggaac accgggcatc tccaacttat aagttggaga aataagagaa 1224 0 

tttcagattg agagaatgaa aaaaaaaaac ccttagttca taggtccatt ctcttagcgc 12300 

aactacagag aacaggggca caaacaggca aaaaacgggc acaacctcaa tggagtgatg 12360 

caacctgcct ggagtaaatg atgacacaag gcaattgacc cacgcatgta tctatctcat 12420 

tttcttacac cttctattac cttctgctct ctctgatttg gaaaaagctg aaaaaaaagg 12480 

ttgaaaccag ttccctgaaa ttattcccct acttgactaa taagtatata aagacggtag 12540 

gtattgattg taattctgta aatctatttc ttaaacttct taaattctac ttttatagtt 12600 

agtctttttt ttagttttaa aacaccaaga acttagtttc gaataaacac acataaacaa 12660 

acaagcttac aaaacaaa atg get gca tat gca get cag ggc tat aag gtg 12711 

Met Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val 
15 10 

eta gta etc aac ccc tct gtt get gca aca ctg ggc ttt ggt get tac 12759 
Leu Val Leu Asn Pro Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr 
15 20 25 

atg tec aag get eat ggg ate gat cct aae ate agg ace ggg gtg aga 12807 
Met Ser Lys Ala His Gly lie Asp Pro Asn lie Arg Thr Gly Val Arg 
30 35 40 

aea att ace act ggc age ecc ate acg tac tee ace tac ggc aag tte 12855 
Thr lie Thr Thr Gly Ser Pro lie Thr Tyr Ser Thr Tyr Gly Lys Phe 
45 50 55 

ett gee gae ggc ggg tge teg ggg ggc get tat gae ata ata att tgt 12903 
Leu Ala Asp Gly Gly Cys Ser Gly Gly Ala Tyr Asp He He He Cys 
60 65 70 75 

gae gag tge cac tec acg gat gee aea tec ate ttg ggc att ggc act 12951 
Asp Glu Cys His Ser Thr Asp Ala Thr Ser He Leu Gly He Gly Thr 
80 85 90 

gtc ett gae caa gca gag act geg ggg gcg aga ctg gtt gtg etc gee 12999 
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Val Leu Asp Gin Ala Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala 
95 100 105 

acc gcc acc cct ccg ggc tec gtc act gtg ccc cat ccc aac ate gag 13047 
Thr Ala Thr Pro Pro Gly Ser Val Thr Val Pro His Pro Asn lie Glu 
110 115 120 

gag gtt get ctg tec acc ace gga gag ate cet ttt tae ggc aag get 13095 
Glu Val Ala Leu Ser Thr Thr Gly Glu lie Pro Phe Tyr Gly Lys Ala 
125 130 135 

ate ccc etc gaa gta ate aag ggg ggg aga cat etc ate ttc tgt cat 13143 
lie Pro Leu Glu Val lie Lys Gly Gly Arg His Leu He Phe Cys His 
140 145 150 155 

tea aag aag aag tgc gac gaa etc gee gea aag ctg gtc gea ttg ggc 13191 
Ser Lys Lys Lys Cys Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly 
160 165 170 

ate aat gee gtg gee tae tae cgc ggt ett gac gtg tec gtc ate ccg 13239 
He Asn Ala Val Ala Tyr Tyr Arg Gly Leu Asp Val Ser Val He Pro 
175 180 185 

acc age ggc gat gtt gtc gtc gtg gea acc gat gcc etc atg ace ggc 13287 
Thr Ser Gly Asp Val Val Val Val Ala Thr Asp Ala Leu Met Thr Gly 
190 195 200 

tat ace ggc gac ttc gac teg gtg ata gac tgc aat aeg tgt gtc ace 13335 
Tyr Thr Gly Asp Phe Asp Ser Val He Asp Cys Asn Thr Cys Val Thr 
205 210 215 

eag aca gtc gat ttc age ett gac cet acc ttc acc att gag aca ate 13383 
Gin Thr Val Asp Phe Ser Leu Asp Pro Thr Phe Thr He Glu Thr He 
220 225 230 235 

aeg etc ecc caa gat get gtc tec cgc act caa cgt egg ggc agg act 13431 
Thr Leu Pro Gin Asp Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr 
240 245 250 

ggc agg ggg aag eea ggc ate tae aga ttt gtg gea ccg ggg gag cgc 13479 
Gly Arg Gly Lys Pro Gly He Tyr Arg Phe Val Ala Pro Gly Glu Arg 
255 260 265 

ccc tee ggc atg ttc gac teg tec gtc etc tgt gag tgc tat gac gea 13527 
Pro Ser Gly Met Phe Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala 
270 275 280 

ggc tgt get tgg tat gag etc aeg ccc gcc gag act aca gtt agg eta 13575 
Gly Cys Ala Trp Tyr Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu 
285 290 295 

cga gcg tae atg aac acc eeg ggg ett eec gtg tgc cag gac cat ett 13623 
Arg Ala Tyr Met Asn Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu 
300 305 310 315 

gaa ttt tgg gag ggc gtc ttt aca ggc etc act cat ata gat gee cac 13671 
Glu Phe Trp Glu Gly Val Phe Thr Gly Leu Thr His He Asp Ala His 
320 325 330 
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ttt eta tec cag aca aag eag agt ggg gag aac ett cct tac ctg gta 13719 
Phe Leu Ser Gin Thr Lys Gin Ser Gly Glu Aan Leu Pro Tyr Leu Val 
335 340 345 

gcg tae caa gee ace gtg tgc get agg get eaa gee ect cec cea teg 13767 
Ala Tyr Gin Ala Thr Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser 
350 355 360 

tgg gae cag atg tgg aag tgt ttg att cge etc aag ece ace etc eat 13815 
Trp Asp Gin Met Trp Lys Cys Leu lie Arg Leu Lys Pro Thr Leu His 
365 370 375 

ggg eea aca cec ctg eta tac aga ctg ggc get gtt eag aat gaa ate 13863 
Gly Pro Thr Pro Leu Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu lie 
380 385 390 395 

ace ctg aeg cac cea gtc acc aaa tac ate atg aca tgc atg teg gee 13911 
Thr Leu Thr His Pro Val Thr Lys Tyr lie Met Thr Cys Met Ser Ala 
400 405 410 

gae ctg gag gtc gtc aeg age acc tgg gtg etc gtt ggc ggc gtc ctg 13959 
Asp Leu Glu Val Val Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu 
415 420 425 

get get ttg gee gcg tat tgc ctg tea aca ggc tgc gtg gtc ata gtg 14007 
Ala Ala Leu Ala Ala Tyr Cys Leu Ser Thr Gly Cys Val Val lie Val 
430 435 440 

ggc agg gtc gtc ttg tec ggg aag ccg gca ate ata ect gae agg gaa 14055 
Gly Arg Val Val Leu Ser Gly Lys Pro Ala He He Pro Asp Arg Glu 
445 450 455 

gtc etc tac cga gag ttc gat gag atg gaa gag tgc tet eag cac tta 14103 
Val Leu Tyr Arg Glu Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu 
460 465 470 475 

ccg tac ate gag caa ggg atg atg etc gee gag cag ttc aag eag aag 14151 
Pro Tyr He Glu Gin Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys 
480 485 490 

gee etc ggc etc ctg cag acc gcg tee cgt cag gca gag gtt ate gee 14199 
Ala Leu Gly Leu Leu Gin Thr Ala Ser Arg Gin Ala Glu Val He Ala 
495 500 505 

ect get gtc cag acc aae tgg caa aaa etc gag acc ttc tgg gcg aag 14247 
Pro Ala Val Gin Thr Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys 
510 515 520 

cat atg tgg aae ttc ate agt ggg ata caa tac ttg gcg ggc ttg tea 14295 
His Met Trp Asn Phe He Ser Gly He Gin Tyr Leu Ala Gly Leu Ser 
525 530 535 

aeg ctg cct ggt aac cec gee att get tea ttg atg get ttt aca get 14343 
Thr Leu Pro Gly Asn Pro Ala He Ala Ser Leu Met Ala Phe Thr Ala 
540 545 550 555 

get gtc acc age cea eta acc act age caa ace etc etc ttc aac ata 14391 
Ala Val Thr Ser Pro Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn He 
560 565 570 
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ttg ggg ggg tgg gtg get gcc cag etc gcc gcc ccc ggt gcc get act 14439 
Leu Gly Gly Trp Val Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr 
575 580 585 

gcc ttt gtg ggc get ggc tta get gge gcc gee ate ggc agt gtt gga 14487 
Ala Phe Val Gly Ala Gly Leu Ala Gly Ala Ala lie Gly Ser Val Gly 
590 595 600 

ctg ggg aag gtc etc ata gac ate ctt gea ggg tat ggc geg gge gtg 14535 
Leu Gly Lys Val Leu lie Asp lie Leu Ala Gly Tyr Gly Ala Gly Val 
605 610 615 

geg gga get ctt gtg gea ttc aag ate atg age ggt gag gtc cee tee 14583 
Ala Gly Ala Leu Val Ala Phe Lys He Met Ser Gly Glu Val Pro Ser 
620 625 630 635 

aeg gag gac ctg gtc aat eta ctg ccc gee ate etc teg cee gga gcc 14631 
Thr Glu Asp Leu Val Asn Leu Leu Pro Ala He Leu Ser Pro Gly Ala 
640 645 650 

etc gta gtc ggc gtg gtc tgt gea gea ata ctg cgc egg cac gtt ggc 14679 
Leu Val Val Gly Val Val Cys Ala Ala He Leu Arg Arg His Val Gly 
655 660 665 

ccg gge gag ggg gea gtg cag tgg atg aac egg ctg ata gcc ttc gee 14727 
Pro Gly Glu Gly Ala Val Gin Trp Met Asn Arg Leu He Ala Phe Ala 
670 675 680 

tec egg ggg aae eat gtt tec ccc aeg cac tac gtg ccg gag age gat 14775 
Ser Arg Gly Asn His Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp 
685 690 695 

gea get gcc ege gtc act gcc ata etc age age etc act gta ace cag 14823 
Ala Ala Ala Arg Val Thr Ala He Leu Ser Ser Leu Thr Val Thr Gin 
700 705 710 715 

etc ctg agg cga etg cac cag tgg ata age teg gag tgt ace act eea 14871 
Leu Leu Arg Arg Leu His Gin Trp He Ser Ser Glu Cys Thr Thr Pro 
720 725 730 

tge tec ggt tec tgg eta agg gac ate tgg gac tgg ata tgc gag gtg 14919 
Cys Ser Gly Ser Trp Leu Ar^ Asp He Trp Asp Trp He Cys Glu Val 
735 740 745 

ttg age gac ttt aag acc tgg eta aaa get aag etc atg cca cag ctg 14967 
Leu Ser Asp Phe Lys Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu 
750 755 760 

cct ggg ate ccc ttt gtg tee tgc cag cgc ggg tat aag ggg gtc tgg 15015 
Pro Gly He Pro Phe Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp 
765 770 775 

cga ggg gac gge ate atg cac act ege tge eae tgt gga get gag ate 15063 
Arg Gly Asp Gly He Met His Thr Arg Cys His Cys Gly Ala Glu He 
780 785 790 795 

act gga cat gtc aaa aac ggg aeg atg agg ate gtc ggt cct agg ace 15111 
Thr Gly His Val Lys Asn Gly Thr Met Arg He Val Gly Pro Arg Thr 
800 805 810 
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tgc agg aac atg tgg agt ggg acc ttc ccc att aat gcc tac acc acg 
Cys Arg Asn Met Trp Ser Gly Thr Phe Pro He Asn Ala Tyr Thr Thr 
815 820 825 



15159 



ggc ccc tgt acc ccc ctt cct gcg ccg aac tac acg ttc gcg eta tgg 
Gly Pro Cys Thr Pro Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp 
830 835 840 



15207 



agg gtg tct gca gag gaa tac gtg gag ata agg cag gtg ggg gac ttc 
Arg Val Ser Ala Glu Glu Tyr Val Glu He Arg Gin Val Gly Asp Phe 
845 850 855 



15255 



cac tac gtg acg ggt atg act act gac aat ctt aaa tgc ccg tgc cag 
His Tyr Val Thr Gly Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin 
860 865 870 875 



15303 



gtc cca teg ccc gaa ttt ttc aca gaa ttg gac ggg gtg cgc eta cat 
Val Pro Ser Pro Glu Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His 
880 885 890 



15351 



agg ttt gcg ccc ccc tgc aag ccc ttg ctg egg gag gag gta tea ttc 
Arg Phe Ala Pro Pro Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe 
895 900 905 



15399 



aga gta gga etc cae gaa tac ccg gta ggg teg caa tta cct tgc gag 
Arg Val Gly Leu His Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu 
910 915 920 



15447 



CCC gaa ccg gac gtg gee gtg ttg acg tec atg etc act gat ccc tec 
Pro Glu Pro Asp Val Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser 
925 930 935 



15495 



cat ata aca gca gag gcg gcc ggg ega agg ttg gcg agg gga tea ccc 
His He Thr Ala Glu Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro 
940 945 950 955 



15543 



ccc tct gtg gee age tec teg get age cag eta tee get cca tet etc 
Pro Ser Val Ala Ser Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu 
960 965 970 



15591 



aag gca act tgc ace get aac cat gac tec cct gat get gag etc ata 
Lys Ala. Thr Cys Thr Ala Asn His Asp Ser Pro Asp Ala Glu Leu He 
975 980 985 



15639 



gag gcc aac etc eta tgg agg cag gag atg ggc ggc aac ate acc agg 
Glu Ala Asn Leu Leu Trp Arg Gin Glu Met Gly Gly Asn He Thr Arg 
990 995 1000 



15687 



gtt gag tea gaa aac aaa gtg gtg att ctg gac tec ttc gat ccg ctt 
Val Glu Ser Glu Asn Lys Val Val He Leu Asp Ser Phe Asp Pro Leu 
1005 1010 1015 



15735 



gtg gcg gag gag gac gag egg gag ate tec gta ccc gca gaa ate ctg 
Val Ala Glu Glu Asp Glu Arg Glu. He Ser Val Pro Ala Glu He Leu 
1020 1025 1030 1035 



15783 



egg aag tct egg aga ttc gcc cag gcc ctg ccc gtt tgg gcg egg ccg 
Arg Lys Ser Arg Arg Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro 
1040 1045 1050 



15831 
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gac tat aac ccc ccg eta gtg gag acg tgg aaa aag ccc gac tac gaa 
Asp Tyr Asn Pro Pro Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu 
1055 1060 1065 



15879 



cca cct gtg gtc cat ggc tgc ccg ctt cca cot cca aag tec cct cct 
Pro Pro Val Val His Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro 
1070 1075 1080 



15927 



gtg cct ccg cct egg aag aag egg acg gtg gtc etc act gaa tea acc 
Val Pro Pro Pro Arg Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr 
1085 1090 1095 



15975 



eta tet act gee ttg gee gag etc gee acc aga age ttt ggc age tec 
Leu Ser Thr Ala Leu Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser 
1100 1105 1110 1115 



16023 



tea act tec ggc att acg ggc gac aat acg aca aca tee tct gag ece 
Ser Thr Ser Gly lie Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro 
1120 1125 1130 



16071 



gee cet tct ggc tgc ccc ccc gac tee gac get gag tee tat tec tec 
Ala Pro Ser Gly Cys Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser 
1135 1140 1145 



16119 



atg ccc ccc ctg gag ggg gag cct ggg gat ccg gat ctt age gac ggg 
Met Pro Pro Leu Glu Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly 
1150 1155 1160 



16167 



tea tgg tea acg gtc agt agt gag gee aac gcg gag gat gtc gtg tgc 
Ser Trp Ser Thr Val Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys 
1165 1170 1175 



16215 



tgc tea atg tet tac tet tgg aca ggc gca etc gtc acc ccg tgc gee 
Cys Ser Met Ser Tyr Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala 
1180 1185 1190 1195 



16263 



gcg gaa gaa eag aaa ctg ccc ate aat gca eta age aac teg ttg eta 
Ala Glu Glu Gin Lys Leu Pro lie Asn Ala Leu Ser Asn Ser Leu Leu 
1200 1205 1210 



16311 



egt eac cac aat ttg gtg tat tec ace acc tea ege agt get tgc caa 
Arg His His Asn Leu Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin 
1215 1220 1225 



16359 



agg eag aag aaa gtc aca ttt gac aga ctg caa gtt ctg gac age eat 
Arg Gin Lys Lys Val Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His 
1230 1235 1240 



16407 



tac cag gac gta etc aag gag gtt aaa gca gcg gcg tea aaa gtg aag 
Tyr Gin Asp Val Leu Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys 
1245 1250 1255 



16455 



get aae ttg eta tec gta gag gaa get tgc age ctg acg ccc cca cac 
Ala Asn Leu Leu Ser Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His 
1260 1265 1270 1275 



16503 



tea gee aaa tec aag ttt ggt tat ggg gca aaa gac gtc egt tgc cat 
Ser Ala Lys Ser Lys Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His 
1280 1285 1290 



16551 
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gcc aga aag gcc gta acc cac ate aac tec gtg tgg aaa gac ctt ctg 
Ala Arg Lys Ala Val Thr His lie Asn Ser Val Trp Lys Asp Leu Leu 
1295 1300 1305 



16599 



gaa gac aat gta aca cca ata gac act acc ate atg get aag aac gag 
Glu Asp Asn Val Thr Pro He Asp Thr Thr He Met Ala Lys Asn Glu 
1310 1315 1320 



16647 



gtt ttc tge gtt eag cet gag aag ggg ggt egt aag cca get cgt etc 
Val Phe Cys Val Gin Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu 
1325 1330 1335 



16695 



ate gtg ttc ccc gat ctg ggc gtg cgc gtg tgc gaa aag atg get ttg 
He Val Phe Pro Asp Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu 
1340 1345 1350 1355 



16743 



tac gac gtg gtt aca aag etc ccc ttg gee gtg atg gga age tec tac 
Tyr Asp Val Val Thr Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr 
1360 1365 1370 



16791 



gga ttc caa tac tea cca gga cag egg gtt gaa ttc etc gtg eaa gcg 
Gly Phe Gin Tyr Ser Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala 
1375 1380 1385 



16839 



tgg aag tec aag aaa acc cea atg ggg ttc teg tat gat acc cgc tge 
Trp Lys Ser Lys Lys Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys 
1390 1395 1400 



16887 



ttt gac tee aca gte act gag age gac ate egt acg gag gag gca ate 
Phe Asp Ser Thr Val Thr Glu Ser Asp He Arg Thr Glu Glu Ala He 
1405 1410 1415 



16935 



tac caa tgt tgt gac etc gac eee eaa gee cgc gtg gcc ate aag tec 
Tyr Gin Cys Cys Asp Leu Asp Pro Gin Ala Arg Val Ala He Lys Ser 
1420 1425 1430 1435 



16983 



etc acc gag agg ctt tat gtt ggg ggc cet ctt acc aat tea agg ggg 
Leu Thr Glu Arg Leu Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly 
1440 1445 1450 



17031 



gag aac tgc ggc tat cgc agg tge cgc gcg age ggc gta ctg aca act 
Gau:. Asn Cys Gly Tyr Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr 
1455 1460 1465 



17079 



age tgt ggt aac aec etc act tgc tac ate aag gee egg gca gee tgt 
Ser Cys Gly Asn Thr Leu Thr Cys Tyr He Lys Ala Arg Ala Ala Cys 
1470 1475 1480 



17127 



cga gee gea ggg etc cag gac tgc ace atg etc gtg tgt ggc gac gac 
Arg Ala Ala Gly Leu Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp 
1485 1490 1495 



17175 



tta gte gtt ate tgt gaa age geg ggg gtc cag gag gac gcg gcg age 
Leu Val Val He Cys Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser 
1500 1505 1510 1515 



17223 



Ctg aga gcc ttc acg gag get atg aec agg tac tec gcc ccc cet ggg 
Leu Arg Ala Phe Thr Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly 
1520 1525 1530 



17271 
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gac ccc cca caa cca gaa tac gac ttg gag etc ata aca tea tgc tec 
Asp Pro Pro Gin Pro Glu Tyr Asp Leu Glu Leu lie Thr Ser Cys Ser 
1535 1540 1545 



17319 



tec aac gtg tea gtc gee cac gac ggc get gga aag agg gtc tac tac 
Ser Asn Val Ser Val Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr 
1550 1555 1560 



17367 



etc ace cgt gac cct aca acc ccc etc gcg aga get geg tgg gag aca 
Leu Thr Arg Asp Pro Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr 
1565 1570 1575 



17415 



gca aga cac act cca gtc aat tec tgg eta ggc aac ata ate atg ttt 
Ala Arg His Thr Pro Val Asn Ser Trp Leu Gly Asn He He Met Phe 
1580 1585 1590 1595 



17463 



gcc cee aca ctg tgg geg agg atg ata etg atg acc cat tte ttt age 
Ala Pro Thr Leu Trp Ala Arg Met He Leu Met Thr His Phe Phe Ser 
1600 1605 1610 



17511 



gtc ctt ata gcc agg gac cag ett gaa cag gee etc gat tgc gag ate 
Val Leu He Ala Arg Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu He 
1615 1620 1625 



17559 



tac ggg gcc tgc tac tec ata gaa cca ctg gat eta cct cca ate att 
Tyr Gly Ala Cys Tyr Ser He Glu Pro Leu Asp Leu Pro Pro He He 
1630 1635 1640 



17607 



caa aga etc cat ggc etc age gca ttt tea etc cac agt tac tct cca 
Gin Arg Leu His Gly Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro 
1645 1650 1655 



17655 



ggt gaa ate aat agg gtg gee gca tgc etc aga aaa ctt ggg gta ceg 
Gly Glu He Asn Arg Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro 
1660 1665 1670 1675 



17703 



cee ttg cga get tgg aga eae egg gee egg age gtc cgc get agg ctt 
Pro Leu Arg Ala Trp Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu 
1680 1685 1690 



17751 



ctg gcc aga gga ggc agg get gee ata tgt ggc aag tac etc tte aac 
Leu Ala Arg Gly Gly Arg Ala Ala He Cys Gly Lys Tyr Leu Phe Asn 
1695 1700 1705 



17799 



tgg gca gta aga aca aag etc aaa etc act cca ata geg gcc get ggc 
Trp Ala Val Arg Thr Lys Leu Lys Leu Thr Pro He Ala Ala Ala Gly 
1710 1715 1720 



17847 



cag ctg gac ttg tec ggc tgg tte acg get ggc tac age ggg gga gac 
Gin Leu Asp Leu Ser Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp 
1725 1730 1735 



17895 



att tat cac age gtg tct eat gcc egg ccc cgc tgg ate tgg ttt tgc 
He Tyr His Ser Val Ser His Ala Arg Pro Arg Trp He Trp Phe Cys 
1740 1745 1750 1755 



17943 



eta etc ctg ctt get gca ggg gta ggc ate tac etc etc ccc aac cga 
Leu Leu Leu Leu Ala Ala Gly Val Gly He Tyr Leu Leu Pro Asn Arg 
1760 1765 1770 



17991 



100 
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atg age acg aat cct aaa cct caa aga aag acc aaa cgt aac acc aac 
Met Ser Thr Asn Pro Lys Pro Gin Arg Lys Thr Lys Arg Asn Thr Asn 
1775 1780 1785 

egg egg ccg cag gac gtc aag ttc ccg ggt ggc ggt cag ate gtt ggt 
Arg Arg Pro Gin Asp Val Lys Phe Pro Gly Gly Gly Gin lie Val Gly 
1790 1795 1800 

gga gtt tac ttg ttg ccg cgc agg ggc cct aga ttg ggt gtg cgc gcg 
Gly Val Tyr Leu Leu Pro Arg Arg Gly Pro Arg Leu Gly Val Arg Ala 
1805 1810 1815 

acg aga aag act tec gag egg teg caa cct cga ggt aga cgt cag cct 
Thr Arg Lys Thr Ser Glu Arg Ser Gin Pro Arg Gly Arg Arg Gin Pro 
1820 1825 1830 1835 

ate ccc aag get cgt egg ccc gag ggc agg acc tgg get cag ecc ggg 
lie Pro Lys Ala Arg Arg Pro Glu Gly Arg Thr Trp Ala Gin Pro Gly 
1840 1845 1850 

tac cct tgg ccc etc tat ggc aat gag ggc tgc ggg tgg gcg gga tgg 
Tyr Pro Trp Pro Leu Tyr Gly Asn Glu Gly Cys Gly Trp Ala Gly Trp 
1855 1860 1865 

etc ctg tct ccc cgt ggc tct egg cct age tgg ggc ccc aca gac ecc 
Leu Leu Ser Pro Arg Gly Ser Arg Pro Ser Trp Gly Pro Thr Asp Pro 
1870 1875 1880 

egg cgt agg teg cgc aat ttg ggt aag taatagtcga ctttgttccc 
Arg Arg Arg Ser Arg Asn Leu Gly Lys 
1885 1890 



actgtaettt 


tagctegtae 


aaaatacaat 


atacttttca 


tttctccgta 


aacaacatgt 


18434 


ttteccatgt 


aatatccttt 


tetattttte 


gttcegttae 


eaactttaca 


cataetttat 


18494 


atagctatte 


acttctatac 


actaaaaaac 


taagacaatt 


ttaattttgc 


tgcctgccat 


18554 


atttcaattt 


gttataaatt 


cctataattt 


atectattag 


tagctaaaaa 


aagatgaatg 


18614 


tgaatcgaat 


ectaagagaa 


ttggatctga 


tccacaggac 


gggtgtggte 


gecatgateg 


18674 


cgtagtcgat 


agtggeteca 


agtagcgaag 


egagcaggac 


tgggcggegg 


ceaaagcggt 


18734 


cggacagtgc 


tcegagaaeg 


ggtgegcata 


gaaattgcat 


caaegcatat 


agegctagca 


18794 


gcacgccata 


gtgactggcg 


atgctgtegg 


aatggacgat 


atcccgcaag 


aggceeggea 


18854 


gtaccggcat 


aaecaagcet 


atgcctaeag 


catccagggt 


gacggtgccg 


aggatgacga 


18914 


tgagegcatt 


gttagatttc 


atacacggtg 


ectgactgeg 


ttagcaattt 


aactgtgata 


18974 


aactacegca 


ttaaagettt 


ttetttccaa 


tttttttttt 


ttegtcatta 


taaaaatcat 


19034 


tacgaccgag 


attccegggt 


aataactgat 


ataattaaat 


tgaagctcta 


atttgtgagt 


19094 


ttagtataea 


tgcatttact 


tataatacag 


ttttttagtt 


ttgetggecg 


catcttctca 


19154 


aatatgcttc 


ceagectget 


tttetgtaae 


gttcaccctc 


taccttagca 


tcccttccct 


19214 



101 



18039 



18087 



18135 



18183 



18231 



18279 



18327 



18374 
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ttgcaaatag 


tcctcttcca 


acaataataa 


tgtcagatcc 


tgtagagacc 


acatcatcca 


19274 


cggttctata 


ctgttgaccc 


aatgcgtctc 


ccttgtcatc 


taaacccaca 


ccgggtgtca 


19334 


taatcaacca 


atcgtaacct 


tcatctcttc 


cacccatgtc 


tctttgagca 


ataaagccga 


19394 


taacaaaatc 


tttgtcgctc 


ttcgcaatgt 


caacagtacc 


cttagtatat 


tctccagtag 


19454 


atagggagcc 


cttgcatgac 


aattctgcta 


acatcaaaag 


gcctctaggt 


tcctttgtta 


19514 


cttcttctgc 


cgcctgcttc 


aaaccgctaa 


caatacctgg 


gcccaccaca 


ccgtgtgcat 


19574 


tcgtaatgtc 


tgcccattct 


gctattctgt 


atacacccgc 


agagtactgc 


aatttgactg 


19634 


tattaccaat 


gtcagcaaat 


tttctgtctt 


cgaagagtaa 


aaaattgtac 


ttggcggata 


19694 


atgcctttag 


cggcttaact 


gtgccctcca 


tggaaaaatc 


agtcaagata 


tccacatgtg 


19754 


tttttagtaa 


acaaattttg 


ggacctaatg 


cttcaactaa 


ctccagtaat 


tccttggtgg 


19814 


tacgaacatc 


caatgaagca 


cacaagtttg 


tttgcttttc 


gtgcatgata 


ttaaatagct 


19874 


tggcagcaac 


aggactagga 


tgagtagcag 


cacgttcctt 


atatgtagct 


ttcgacatga 


19934 


tttatcttcg 


tttcctgcag 


gtttttgttc 


tgtgcagttg 


ggttaagaat 


actgggcaat 


19994 


ttcatgtttc 


ttcaacacta 


catatgcgta 


tatataccaa 


tctaagtctg 


tgctccttcc 


20054 


ttcgttcttc 


cttctgttcg 


gagattaccg 


aatcaaaaaa 


atttcaagga 


aaccgaaatc 


20114 


aaaaaaaaga 


ataaaaaaaa 


aatgatgaat 


tgaaaagctt 


atcgat 




20160 



<210> 13 
<211> 1892 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 
pd.delta.NS3NS5.pj .corel21 

<400> 13 

Met Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val Leu Val Leu Asn Pro 
15 10 15 

Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr Met Ser Lys Ala His 
20 25 30 

Gly He Asp Pro Asn He Arg Thr Gly Val Arg Thr He Thr Thr Gly 
35 40 45 

Ser Pro He Thr Tyr Ser Thr Tyr Gly Lys Phe Leu Ala Asp Gly Gly 
50 55 60 

Cys Ser Gly Gly Ala Tyr Asp He He He Cys Asp Glu Cys His Ser 
65 70 75 80 

Thr Asp Ala Thr Ser He Leu Gly He Gly Thr Val Leu Asp Gin Ala 

102 
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85 



90 



95 



Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala Thr Ala Thr Pro Pro 
100 105 110 

Gly Ser Val Thr Val Pro His Pro Asn He Glu Glu Val Ala Leu Ser 
115 120 125 

Thr Thr Gly Glu He Pro Phe Tyr Gly Lys Ala He Pro Leu Glu Val 
130 135 140 

He Lys Gly Gly Arg His Leu He Phe Cys His Ser Lys Lys Lys Cys 
145 150 155 160 

Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly He Asn Ala Val Ala 
165 170 175 

Tyr Tyr Arg Gly Leu Asp Val Ser Val He Pro Thr Ser Gly Asp Val 
180 185 190 

Val Val Val Ala Thr Asp Ala Leu Met Thr Gly Tyr Thr Gly Asp Phe 
195 200 205 

Asp Ser Val He Asp Cys Asn Thr Cys Val Thr Gin Thr Val Asp Phe 
210 215 220 

Ser Leu Asp Pro Thr Phe Thr He Glu Thr He Thr Leu Pro Gin Asp 
225 230 235 240 

Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr Gly Arg Gly Lys Pro 
245 250 255 

Gly He Tyr Arg Phe Val Ala Pro Gly Glu Arg Pro Ser Gly Met Phe 
260 265 270 

Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala Gly Cys Ala Trp Tyr 
275 280 285 

Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu Arg Ala Tyr Met Asn 
290 295 300 

Thr Pro Gly Leu Pro Val; Cys Gin Asp His Leu Glu Phe Trp Glu Gly 
305 310 315 320 

Val Phe Thr Gly Leu Thr His He Asp Ala His Phe Leu Ser Gin Thr 
325 330 335 

Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val Ala Tyr Gin Ala Thr 
340 345 350 

Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser Trp Asp Gin Met Trp 
355 360 365 

Lys Cys Leu He Arg Leu Lys Pro Thr Leu His Gly Pro Thr Pro Leu 
370 375 380 



Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu He Thr Leu Thr His Pro 
385 390 395 400 
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Val Thr Lya Tyr He Met Thr Cys Met Ser Ala Asp Leu Glu Val Val 
405 410 415 

Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu Ala Ala Leu Ala Ala 
420 425 430 

Tyr Cys Leu Ser Thr Gly Cys Val Val He Val Gly Arg Val Val Leu 
435 440 445 

Ser Gly Lys Pro Ala He He Pro Asp Arg Glu Val Leu Tyr Arg Glu 
450 455 460 

Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu Pro Tyr He Glu Gin 
465 470 475 480 

Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys Ala Leu Gly Leu Leu 
485 490 495 

Gin Thr Ala Ser Arg Gin Ala Glu Val He Ala Pro Ala Val Gin Thr 
500 505 510 

Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys His Met Trp Asn Phe 
515 520 525 

He Ser Gly He Gin Tyr Leu Ala Gly Leu Ser Thr Leu Pro Gly Asn 
530 535 540 

Pro Ala He Ala Ser Leu Met Ala Phe Thr Ala Ala Val Thr Ser Pro 
545 550 555 560 

Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn He Leu Gly Gly Trp Val 
565 570 575 

Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr Ala Phe Val Gly Ala 
580 585 590 

Gly Leu Ala Gly Ala Ala He Gly Ser Val Gly Leu Gly Lys Val Leu 
595 600 605 

He Asp He Leu Ala Gly Tyr Gly Ala Gly Val Ala Gly Ala Leu Val 
610 615 620 

Ala Phe Lys He Met Ser Gly Glu Val Pro Ser Thr Glu Asp Leu Val 
625 630 635 640 

Asn Leu Leu Pro Ala He Leu Ser Pro Gly Ala Leu Val Val Gly Val 
645 650 655 

Val Cys Ala Ala He Leu Arg Arg His Val Gly Pro Gly Glu Gly Ala 
660 665 670 

Val Gin Trp Met Asn Arg Leu He Ala Phe Ala Ser Arg Gly Asn His 
675 680 685 

Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp Ala Ala Ala Arg Val 
690 695 700 



Thr Ala He Leu Ser Ser Leu Thr Val Thr Gin Leu Leu Arg Arg Leu 
705 710 715 720 
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His Gin Trp lie Ser Ser Glu Cys Thr Thr Pro Cys Ser Gly Ser Trp 
725 730 735 

Leu Arg Asp lie Trp Asp Trp lie Cys Glu Val Leu Ser Asp Phe Lys 
740 745 750 

Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu Pro Gly lie Pro Phe 
755 760 765 

Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp Arg Gly Asp Gly lie 
770 775 780 

Met His Thr Arg Cys His Cys Gly Ala Glu lie Thr Gly His Val Lys 
785 790 795 800 

Asn Gly Thr Met Arg He Val Gly Pro Arg Thr Cys Arg Asn Met Trp 
805 810 815 

Ser Gly Thr Phe Pro He Asn Ala Tyr Thr Thr Gly Pro Cys Thr Pro 
820 825 830 

Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp Arg Val Ser Ala Glu 
835 840 845 

Glu Tyr Val Glu lie Arg Gin Val Gly Asp Phe His Tyr Val Thr Gly 
850 855 860 

Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin Val Pro Ser Pro Glu 
865 870 875 880 

Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His Arg Phe Ala Pro Pro 
885 890 895 

Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe Arg Val Gly Leu His 
900 905 910 

Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu Pro Glu Pro Asp Val 
915 920 925 

Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser His He Thr Ala Glu 
930 935 940 

Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro Pro Ser Val Ala Ser 
945 950 955 960 

Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu Lys Ala Thr Cys Thr 
965 970 975 

Ala Asn His Asp Ser Pro Asp Ala Glu Leu He Glu Ala Asn Leu Leu 
980 985 990 

Trp Arg Gin Glu Met Gly Gly Asn He Thr Arg Val Glu Ser Glu Asn 
995 1000 1005 

Lys Val Val He Leu Asp Ser Phe Asp Pro Leu Val Ala Glu Glu Asp 
1010 1015 1020 

Glu Arg Glu He Ser Val Pro Ala Glu He Leu Arg Lys Ser Arg Arg 
025 1030 1035 1040 
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Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro Asp Tyr Asn Pro Pro 
1045 1050 1055 

Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu Pro Pro Val Val His 
1060 1065 1070 

Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro Val Pro Pro Pro Arg 
1075 1080 1085 

Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr Leu Ser Thr Ala Leu 
1090 1095 1100 

Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser Ser Thr Ser Gly lie 
105 1110 1115 1120 

Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro Ala Pro Ser Gly Cys 
1125 1130 1135 

Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser Met Pro Pro Leu Glu 
1140 1145 1150 

Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly Ser Trp Ser Thr Val 
1155 1160 1165 

Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys Cys Ser Met Ser Tyr 
1170 1175 1180 

Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala Ala Glu Glu Gin Lys 
185 1190 1195 1200 

Leu Pro lie Asn Ala Leu Ser Asn Ser Leu Leu Arg His His Asn Leu 
1205 1210 1215 

Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin Arg Gin Lys Lys Val 
1220 1225 1230 

Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His Tyr Gin Asp Val Leu 
1235 1240 1245 

Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys Ala Asn Leu Leu Ser 
1250 1255 1260 

Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His Ser Ala Lys Ser Lys 
265 1270 1275 1280 

Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His Ala Arg Lys Ala Val 
1285 1290 1295 

Thr His lie Asn Ser Val Trp Lys Asp Leu Leu Glu Asp Asn Val Thr 
1300 1305 1310 

Pro lie Asp Thr Thr lie Met Ala Lys Asn Glu Val Phe Cys Val Gin 
1315 1320 1325 

Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu lie Val Phe Pro Asp 
1330 1335 1340 

Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu Tyr Asp Val Val Thr 
345 1350 1355 1360 
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Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr Gly Phe Gin Tyr Ser 
1365 1370 1375 

Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala Trp Lys Ser Lys Lys 
1380 1385 1390 

Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys Phe Asp Ser Thr Val 
1395 1400 1405 

Thr Glu Ser Asp He Arg Thr Glu Glu Ala He Tyr Gin Cys Cys Asp 
1410 1415 1420 

Leu Asp Pro Gin Ala Arg Val Ala He Lys Ser Leu Thr Glu Arg Leu 
425 1430 1435 1440 

Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly Glu Asn Cys Gly Tyr 
1445 1450 1455 

Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr Ser Cys Gly Asn Thr 
1460 1465 1470 

Leu Thr Cys Tyr He Lys Ala Arg Ala Ala Cys Arg Ala Ala Gly Leu 
1475 1480 1485 

Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp Leu Val Val He Cys 
1490 1495 1500 

Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser Leu Arg Ala Phe Thr 
505 1510 1515 1520 

Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly Asp Pro Pro Gin Pro 
1525 1530 1535 

Glu Tyr Asp Leu Glu Leu He Thr Ser Cys Ser Ser Asn Val Ser Val 
1540 1545 1550 

Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr Leu Thr Arg Asp Pro 
1555 1560 1565 

Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr Ala Arg His Thr Pro 
1570 1575 1580 

Val Asn Ser Trp Leu Gly Asn He He Met Phe Ala Pro Thr Leu Trp 
585 1590 1595 1600 

Ala Arg Met He Leu Met Thr His Phe Phe Ser Val Leu He Ala Arg 
1605 1610 1615 

Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu He Tyr Gly Ala Cys Tyr 
1620 1625 1630 

Ser He Glu Pro Leu Asp Leu Pro Pro He He Gin Arg Leu His Gly 
1635 1640 1645 

Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro Gly Glu He Asn Arg 
1650 1655 1660 

Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro Pro Leu Arg Ala Trp 
^€5 1670 1675 1680 
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Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu Leu Ala Arg Gly Gly 
1685 1690 1695 

Arg Ala Ala lie Cys Gly Lys Tyr Leu Phe Asn Trp Ala Val Arg Thr 
1700 1705 1710 

Lys Leu Lys Leu Thr Pro He Ala Ala Ala Gly Gin Leu Asp Leu Ser 
1715 1720 1725 

Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp He Tyr His Ser Val 
1730 1735 1740 

Ser His Ala Arg Pro Arg Trp He Trp Phe Cys Leu Leu Leu Leu Ala 
745 1750 1755 1760 

Ala Gly Val Gly He Tyr Leu Leu Pro Asn Arg Met Ser Thr Asn Pro 
1765 1770 1775 

Lys Pro Gin Arg Lys Thr Lys Arg Asn Thr Asn Arg Arg Pro Gin Asp 
1780 1785 1790 

Val Lys Phe Pro Gly Gly Gly Gin He Val Gly Gly Val Tyr Leu Leu 
1795 1800 1805 

Pro Arg Arg Gly Pro Arg Leu Gly Val Arg Ala Thr Arg Lys Thr Ser 
1810 1815 1820 

Glu Arg Ser Gin Pro Arg Gly Arg Arg Gin Pro He Pro Lys Ala Arg 
825 1830 1835 1840 

Arg Pro Glu Gly Arg Thr Trp Ala Gin Pro Gly Tyr Pro Trp Pro Leu 
1845 1850 1855 

Tyr Gly Asn Glu Gly Cys Gly Trp Ala Gly Trp Leu Leu Ser Pro Arg 
1860 1865 1870 

Gly Ser Arg Pro Ser Trp Gly Pro Thr Asp Pro Arg Arg Arg Ser Arg 
1875 1880 1885 

Asn Leu Gly Lys 
1890 



<210> 14 
<211> 20316 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 
pd . delta .NS3NS5 .pj . corel73 

<220> 
<221> CDS 

<222> (12679) . . (18510) 
<400> 14 

atcgatccta ccccttgcgc taaagaagta tatgtgccta ctaacgcttg tctttgtctc 60 
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tgtcactaaa 


cactggatta 


ttactcccag atacttattt tggactaatt 


taaatgattt 


120 


cggatcaacg 


ttcttaatat 


cgctgaatct 


tccacaattg atgaaagtag 


ctaggaagag 


180 


gaattggtat 


aaagtttttg 


tttttgtaaa 


tctcgaagta tactcaaacg 


aatttagtat 


240 


tttctcagtg 


atctcccaga 


tgctttcacc 


ctcacttaga agtgctttaa 


gcattttttt 


300 


actgtggcta 


tttcccttat 


ctgcttcttc 


cgatgattcg aactgtaatt 


gcaaactact 


360 


tacaatatca 


gtgatatcag 


attgatgttt 


ttgtccatag taaggaataa 


ttgtaaattc 


420 


ccaagcagga 


atcaatttct 


ttaatgaggc 


ttccagaatt gttgcttttt 


gcgtcttgta 


480 


tttaaactgg 


agtgatttat 


tgacaatatc 


gaaactcagc gaattgctta 


tgatagtatt 


540 


atagctcatg 


aatgtggctc 


tcttgattgc 


tgttccgtta tgtgtaatca 


tccaacataa 


600 


ataggttagt 


tcagcagcac 


ataatgctat 


tttctcacct gaaggtcttt 


caaacctttc 


660 


cacaaactga 


cgaacaagca 


ccttaggtgg 


tgttttacat aatatatcaa 


attgtggcat 


720 


gcttagcgcc 


gatcttgtgt 


gcaattgata 


tctagtttca actactctat 


ttatcttgta 


780 


tcttgcagta 


ttcaaacacg 


ctaactcgaa 


aaactaactt taattgtcct 


gtttgtctcg 


840 


cgttctttcg 


aaaaatgcac 


cggccgcgca 


ttatttgtac tgcgaaaata 


attggtactg 900 


cggtatcttc 


atttcatatt 


ttaaaaatgc 


acctttgctg cttttcctta 


atttttagac 


960 


ggcccgcagg 


ttcgttttgc 


ggtactatct 


tgtgataaaa agttgttttg 


acatgtgatc 


1020 


tgcacagatt 


ttataatgta 


ataagcaaga 


atacattatc aaacgaacaa 


tactggtaaa 


1080 


agaaaaccaa 


aatggacgac 


attgaaacag 


ccaagaatct gacggtaaaa 


gcacgtacag 


1140 


cttatagcgt 


ctgggatgta 


tgtcggctgt 


ttattgaaat gattgctcct 


gatgtagata 


1200 


ttgatataga 


gagtaaacgt 


aagtctgatg 


agctactctt tccaggatat 


gtcataaggc 


1260 


ccatggaatc 


tctcacaacc 


ggtaggccgt 


atggtcttga ttctagcgca 


gaagattcca 


1320 


gcgtatcttc 


tgactccagt 


gctgaggtaa 


ttttgcctgc tgcgaagatg gttaaggaaa 


1380 


ggtttgattc 


gattggaaat 


ggtatgctct 


cttcacaaga agcaagtcag 


gctgccatag 


1440 


atttgatgct 


acagaataac 


aagctgttag 


acaatagaaa gcaactatac 


aaatctattg 


1500 


ctataataat 


aggaagattg 


cccgagaaag 


acaagaagag agctaccgaa 


atgctcatga 


1560 


gaaaaatgga 


ttgtacacag 


ttattagtcc 


caccagctcc aacggaagaa gatgttatga 


1620 


agctcgtaag 


cgtcgttacc 


caattgctta 


ctttagttcc accagatcgt 


caagctgctt 


1680 


taataggtga 


tttattcatc 


cc99aatctc 


taaaggatat attcaatagt 


ttcaatgaac 


1740 


tggcggcaga 


gaatcgttta 


cagcaaaaaa 


agagtgagtt ggaaggaagg 


actgaagtga 


1800 


accatgctaa 


tacaaatgaa 


gaagttccct 


ccaggcgaac aagaagtaga 


gacacaaatg 


1860 
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caagaggagc 


atataaatta 


caaaacacca 


aaaaaaggag 


agtagcaacg 


agggtaaggg 


gatccaatat 


caaaggaaat 


gatagcattg 


cagcatatag 


aacagctaaa 


gggtagtgct 


gggataatat 


cacaggaggt 


actagactac 


gtacgcattt 


aagcataaac 


acgcactatg 


caacacgcag 


atataggtgc 


gacgtgaaca 


ttttcggaag 


cgctcgtttt 


cggaaacgct 


ctagaaagta 


taggaacttc 


agagcgcttt 


ttcaaaaaac 


caaaaacgca 


ccggactgta 


tccacaaaca 


ttgctcaaaa 


gtatctcttt 


aacctaccca 


tccacctttc 


gctccttgaa 


aggcttccaa 


tgctcttcaa 


attttactgt 


ctcttcataa 


tgtaagctta 


tctttatcga 


ctttacggtt 


ccctgagatt 


gaattagttc 


ctttgtacga 


cgaattttga 


ggttcgccat 


tattatctcc 


gcctcagttt 


gatcttccgc 


tatttcaccc 


cacaatcctt 


catccgcctc 


atgttgtaca 


ttgtttagtt 


cacgagaagg 


tatatgacct 


ttatcctgtt 


ctctttccac 


gcacctaata 


acattcttca 


aggcggagaa 


tgaaaacgtg 


agaatgaatt 


tagtattatt 


tcgaagataa 


gagaagaatg 


cagtgacctt 


aaaaaatacg 


cctttaggcc 


ttctgatacc 


attaatatct 


aaaccctctc 


cgatggtggc 


aaactgtgat 


aattctgggt 


gatttatgat 


aggatcaggc 


caatccagtt 


ctttttcaat 


tccaacaaat 


gcaaatgcta 


acgttttgta 


cccccttgtc 


gtctcgatta 


cacacctact 


cataatacat 


tgcttaatac 


aagcaagcag 



tcactgaggg 


ccctaaagcg gttcccacga 


1920 


gcagaaaatc 


acgtaatact 


tctagggtat 


1980 


aaggatgaga 


ctaatccaat 


tgaggagtgg 


2040 


gaaggaagca 


tacgataccc 


cgcatggaat 


2100 


ctttcatcct 


acataaatag 


acgcatataa 


2160 


ccgttcttct 


catgtatata 


tatatacagg 


2220 


gtgagctgta 


tgtgcgcagc 


tcgcgttgca 


2280 


ttgaagttcc 


tattccgaag 


ttcctattct 


2340 


tgaaaaccaa 


aagcgctctg 


aagacgcact 


2400 


acgagctact 


aaaatattgc 


gaataccgct 


2460 


gctatatatc 


tctgtgctat 


atccctatat 


2520 


cttgcatcta 


aactcgacct 


ctacatcaac 


2580 


caagtagacc 


catacggctg 


taatatgctg 


2640 


atcgtgtgaa 


aaactactac 


cgcgataaac 


2700 


ctttagtata 


tgatacaaga 


cacttttgaa 


2760 


cctctggcta 


tttccaatta 


tcctgtcggc 


2820 


ttcagactgc 


catttttcac 


ataatgaatc 


2880 


cgcatcttgt 


tccgttaaac 


tattgacttc 


2940 


gtcctcttca 


ggcggtagct 


cctgatctcc 


3000 


aaacttagaa 


atgtattcat 


gaattatgga 


3060 


gtttgggcca 


gatgcccaat 


atgcttgaca 


3120 


gtgatattct 


gaggcaattt 


tattataatc 


3180 


tgtattgaca 


aatggagatt 


ccatgtatct 


3240 


ctttcccctg 


cggtttagcg 


tgccttttac 


3300 


ctttaactga 


ctaataaatg 


caaccgatat 


3360 


tcgatcgaca 


attgtattgt 


acactagtgc 


3420 


taccggtgtg 


tcgtctgtat 


tcagtacatg 


3480 


tttcttataa 


ttgtcaggaa 


ctggaaaagt 


3540 


ttcatcgtac 


accataggtt 


ggaagtgctg 


3600 


tctctcgcca 


ttcatatttc 


agttattttc 


3660 
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cattacagct gatgtcattg tatatcagcg 
tcgcggtttt tataaacaaa actttcgtta 
ttggaaattc gggaaaaagt agagcaacgc 
ttaacttcga gaagggatta aggctaattt 
ccattgaatg ccttataaaa cagctataga 
tttgtcaaag cttactgatg atgatgtgtc 
tgacattata aagctggcac ttagaattcc 
tctactgtac gatacacttc cgctcaggtc 
ttgttactct attgatccag ctcagcaaag 
tgtagtaaaa ctagctagac cgagaaagag 
gctgccatca ttattatccg atgtgacgct 
tttttttttt tttttttttt ttttttggta 
agcaaggatt ttcttaactt cttcggcgac 
accacctaaa tcaccagttc tgatacctgc 
ggctttacct tcttcaggca agttcaatga 
agtggcgata gggttgacct tattctttgg 
gtacaaacca aatgcggtgt tcttgtctgg 
acccaaggag cctgggataa cggaggcttc 
ggtgattata ataccattta ggtgggttgg 
aatcaattga tgttgaactt tcaatgtagg 
ttttctccat aatcttgaag aggccaaaac 
tggtggctca tgttgtaggg ccatgaaagc 
aacggtgtat tgttcactat cccaagcgac 
aaagtaaata cctcccacta attctctaac 
tggcttgatt ggagataagt ctaaaagaga 
ggcgtacaat tgaagttctt tacggatttt 
ggtaccccat ttaggaccac ccacagcacc 
ttccagcgcc tcatctggaa gtggaacacc 
atgattttcg aaatcgaact tgacattgga 
aatggcttcg gctgtgattt cttgaccaac 



ctgtaaaaat ctatctgtta cagaaggttt 3720 
cgaaatcgag caatcacccc agctgcgtat 3780 
gagttgcatt ttttacacca taatgcatga 3840 
cactagtatg tttcaaaaac ctcaatctgt 3900 
ttgcatagaa gagttagcta ctcaatgctt 3960 
tactttcagg cgggtctgta gtaaggagaa 4020 
acggactata gactatacta gtatactccg 4080 
cttgtccttt aacgaggcct taccactctt 4140 
gcagtgtgat ctaagattct atcttcgcga 4200 
actagaaatg caaaaggcac ttctacaatg 4260 
gcattttttt tttttttttt tttttttttt 4320 
caaatatcat aaaaaaagag aatcttttta 4380 
agcatcaccg acttcggtgg tactgttgga 4440 
atccaaaacc tttttaactg catcttcaat 4500 
caatttcaac atcattgcag cagacaagat 4560 
caaatctgga gcggaaccat ggcatggttc 4620 
caaagaggcc aaggacgcag atggcaacaa 4680 
atcggagatg atatcaccaa acatgttgct 4740 
gttcttaact aggatcatgg cggcagaatc 4800 
gaattcgttc ttgatggttt cctccacagt 4860 
attagcttta tccaaggacc aaataggcaa 4920 
ggccattctt gtgattcttt gcacttctgg 4980 
accatcacca tcgtcttcct ttctcttacc 5040 
aacaacgaag tcagtacctt tagcaaattg 5100 
gtcggatgca aagttacatg gtcttaagtt 5160 
tagtaaacct tgttcaggtc taacactacc 5220 
taacaaaacg gcatcagcct tcttggaggc 5280 
tgtagcatcg atagcagcac caccaattaa 5340 
acgaacatca gaaatagctt taagaacctt 5400 
gtggtcacct ggcaaaacga cgatcttctt 5460 
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aggggcagac attacaatgg tatatccttg 
aaaaaaaaaa atgcagcttc tcaatgatat 
tatccgacaa actgttttac agatttacga 
acatccgaac ctgggagttt tccctgaaac 
tatagtctag cgctttacgg aagacaatgt 
atctattgca taggtaatct tgcacgtcgc 
tgcacttcaa tagcatatct ttgttaacga 
atgcaacgcg agagcgctaa tttttcaaac 
gaaatgcaac gcgaaagcgc tattttacca 
caaaaatgca acgcgagagc gctaattttt 
gaacagaaat gcaacgcgag agcgctattt 
ttctacaaaa atgcatcccg agagcgctat 
tttctccttt gtgcgctcta taatgcagtc 
taaggttaga agaaggctac tttggtgtct 
cacttcccgc gtttactgat ' tactagcgaa 
atccccgatt atattctata ccgatgtgga 
gcgttgatga ttcttcattg gtcagaaaat 
atactacgta taggaaatgt ttacattttc 
tcttactaca atttttttgt ctaaagagta 
gtcgagttta gatgcaagtt caaggagcga 
agcacagaga tatatagcaa agagatactt 
aatattttag tagctcgtta cagtccggtg 
gagcgctttt ggttttcaaa agcgctctga 
tcggaatagg aacttcaaag cgtttccgaa 
tgcgcacata cagctcactg ttcacgtcgc 
tatacatgag aagaacggca tagtgcgtgt 
atttatgtag gatgaaaggt agtctagtac 
gtatcgtatg cttccttcag cactaccctt 
tggattagtc tcatccttca atgctatcat 
ccgagaaact agtgcgaagt agtgatcagg 



aaatatatat aaaaaaaaaa aaaaaaaaaa 5520 
tcgaatacgc tttgaggaga tacagcctaa 5580 
tcgtacttgt tacccatcat tgaattttga 5640 
agatagtata tttgaacctg tataataata 5700 
atgtatttcg gttcctggag aaactattgc 5760 
atccccggtt cattttctgc gtttccatct 5820 
agcatctgtg cttcattttg tagaacaaaa 5880 
aaagaatctg agctgcattt ttacagaaca 5940 
acgaagaatc tgtgcttcat ttttgtaaaa 6000 
caaacaaaga atctgagctg catttttaca 6060 
taccaacaaa gaatctatac ttcttttttg 6120 
ttttctaaca aagcatctta gattactttt 6180 
tcttgataac tttttgcact gtaggtccgt 6240 
attttctctt ccataaaaaa agcctgactc 63 00 
gctgcgggtg cattttttca agataaaggc 6360 
ttgcgcatac tttgtgaaca gaaagtgata 642 0 
tatgaacggt ttcttctatt ttgtctctat 64 80 
gtattgtttt cgattcactc tatgaatagt 654 0 
atactagaga taaacataaa aaatgtagag 6600 
aaggtggatg ggtaggttat atagggatat 6660 
ttgagcaatg tttgtggaag cggtattcgc 672 0 
cgtttttggt tttttgaaag tgcgtcttca 6780 
agttcctata ctttctagag aataggaact 6840 
aacgagcgct tccgaaaatg caacgcgagc 6900 
acctatatct gcgtgttgcc tgtatatata 6960 
ttatgcttaa atgcgtactt atatgcgtct 7020 
ctcctgtgat attatcccat tccatgcggg 7080 
tagctgttct atatgctgcc actcctcaat 7140 
ttcctttgat attggatcat atgcatagta 7200 
tattgctgtt atctgatgag tatacgttgt 7260 
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cctggccacg 


gcagaagcac 


gcttatcgct 


ccaatttccc 


acaacattag 


tcaactccgt 


7320 


taggcccttc 


attgaaagaa 


atgaggtcat 


caaatgtctt 


ccaatgtgag 


attttgggcc 


7380 


attttttata 


gcaaagattg aataaggcgc 


atttttcttc 


aaagctttat 


tgtacgatct 


7440 


gactaagtta 


tcttttaata 


attggtattc 


ctgtttattg 


cttgaagaat 


tgccggtcct 


7500 


atttactcgt 


tttaggactg gttcagaatt 


cctcaaaaat 


tcatccaaat 


atacaagtgg 


7560 


atcgatgata 


agctgtcaaa 


catgagaatt 


cttgaagacg 


aaagggcctc 


gtgatacgcc 


7620 


tatttttata 


ggttaatgtc 


atgataataa 


tggtttctta 


gacgtcaggt 


ggcacttttc 


7680 


ggggaaatgt 


gcgcggaacc 


cctatttgtt 


tatttttcta 


aatacattca 


aatatgtatc 


7740 


cgctcatgag 


acaataaccc 


tgataaatgc 


ttcaataata 


ttgaaaaagg 


aagagtatga 


7800 


gtattcaaca 


tttccgtgtc 


gcccttattc 


ccttttttgc 


ggcattttgc 


cttcctgttt 


7860 


ttgctcaccc 


agaaacgctg 


gtgaaagtaa 


aagatgctga 


agatcagttg ggtgcacgag 


7920 


tgggttacat 


cgaactggat 


ctcaacagcg 


gtaagatcct 


tgagagtttt 


cgccccgaag 


7980 


aacgttttcc 


aatgatgagc 


acttttaaag 


ttctgctatg 


tggcgcggta 


ttatcccgtg 


8040 


ttgacgccgg 


gcaagagcaa 


ctcggtcgcc 


gcatacacta 


ttctcagaat 


gacttggttg 


8100 


agtactcacc 


agtcacagaa 


aagcatctta 


cggatggcat 


gacagtaaga 


gaattatgca 


8160 


gtgctgccat 


aaccatgagt 


gataacactg 


cggccaactt 


acttctgaca 


acgatcggag 


8220 


gaccgaagga 


gctaaccgct 


tttttgcaca 


acatggggga 


tcatgtaact 


cgccttgatc 


8280 


gttgggaacc 


ggagctgaat 


gaagccatac 


caaacgacga 


gcgtgacacc 


acgatgcctg 


8340 


cagcaatggc 


aacaacgttg 


cgcaaactat 


taactggcga 


actacttact 


ctagcttccc 


8400 


ggcaacaatt 


aatagactgg 


atggaggcgg 


ataaagttgc 


aggaccactt 


ctgcgctcgg 


8460 


cccttccggc 


tggctggttt 


attgctgata 


aatctggagc 


cggtgagcgt 


gggtctcgcg 


8520 


gtatcattgc 


agcactgggg 


ccagatggta 


agccctcccg 


tatcgtagtt 


atctacacga 


8580 


cggggagtca 


ggcaactatg 


gatgaacgaa 


atagacagat 


cgctgagata 


ggtgcctcac 


8640 


tgattaagca 


ttggtaactg 


tcagaccaag 


tttactcata 


tatactttag 


attgatttaa 


8700 


aacttcattt 


ttaatttaaa 


aggatctagg 


tgaagatcct 


ttttgataat 


ctcatgacca 


8760 


aaatccctta 


acgtgagttt 


tcgttccact 


gagcgtcaga 


ccccgtagaa 


aagatcaaag 


8820 


gatcttcttg 


agatcctttt 


tttctgcgcg 


taatctgctg 


cttgcaaaca 


aaaaaaccac 


8880 


cgctaccagc 


ggtggtttgt 


ttgccggatc 


aagagctacc 


aactcttttt 


ccgaaggtaa 


8940 


ctggcttcag 


cagagcgcag 


ataccaaata 


ctgtccttct 


agtgtagccg 


tagttaggcc 


9000 


accacttcaa 


gaactctgta 


gcaccgccta 


catacctcgc 


tctgctaatc 


ctgttaccag 


9060 
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tggctgctgc cagtggcgat aagtcgtgtc ttaccgggtt ggactcaaga cgatagttac 9120 
cggataaggc gcagcggtcg ggctgaacgg ggggttcgtg cacacagccc agcttggagc 9180 
gaacgaccta caccgaactg agatacctac agcgtgagct atgagaaagc gccacgcttc 9240 
ccgaagggag aaaggcggac aggtatccgg taagcggcag ggtcggaaca ggagagcgca 9300 
cgagggagct tccaggggga aacgcctggt atctttatag tcctgtcggg tttcgccacc 9360 
tctgacttga gcgtcgattt ttgtgatgct cgtcaggggg gcggagccta tggaaaaacg 9420 
ccagcaacgc ggccttttta cggttcctgg ccttttgctg gccttttgct cacatgttct 9480 
ttcctgcgtt atcccctgat tctgtggata accgtattac cgcctttgag tgagctgata 9540 
ccgctcgccg cagccgaacg accgagcgca gcgagtcagt gagcgaggaa gcggaagagc 9600 
gcctgatgcg gtattttctc cttacgcatc tgtgcggtat ttcacaccgc atatggtgca 9660 
ctctcagtac aatctgctct gatgccgcat agttaagcca gtatacactc cgctatcgct 9720 
acgtgactgg gtcatggctg cgccccgaca cccgccaaca cccgctgacg cgccctgacg 9780 
ggcttgtctg ctcccggcat ccgcttacag acaagctgtg accgtctccg ggagctgcat 984 0 
gtgtcagagg ttttcaccgt catcaccgaa acgcgcgagg cagctgcggt aaagctcatc 9900 
agcgtggtcg tgaagcgatt cacagatgtc tgcctgttca tccgcgtcca gctcgttgag 9960 
tttctccaga agcgttaatg tctggcttct gataaagcgg gccatgttaa gggcggtttt 10020 
ttcctgtttg gtcactgatg cctccgtgta agggggattt ctgttcatgg gggtaatgat 10080 
accgatgaaa cgagagagga tgctcacgat acgggttact gatgatgaac atgcccggtt 10140 
actggaacgt tgtgagggta aacaactggc ggtatggatg cggcgggacc agagaaaaat 10200 
cactcagggt caatgccagc gcttcgttaa tacagatgta ggtgttccac agggtagcca 10260 
gcagcatcct gcgatgcaga tccggaacat aatggtgcag ggcgctgact tccgcgtttc 10320 
cagactttac gaaacacgga aaccgaagac cattcatgtt gttgctcagg tcgcagacgt 10380 
tttgcagcag cagtcgcttc acgttcgctc gcgtatcggt gattcattct gctaaccagt 1044 0 
aaggcaaccc cgccagccta gccgggtcct caacgacagg agcacgatca tgcgcacccg 10500 
tggccaggac ccaacgctgc ccgagatgcg ccgcgtgcgg ctgctggaga tggcggacgc 10560 
gatggatatg ttctgccaag ggttggtttg cgcattcaca gttctccgca agaattgatt 10620 
ggctccaatt cttggagtgg tgaatccgtt agcgaggtgc cgccggcttc cattcaggtc 10680 
gaggtggccc ggctccatgc accgcgacgc aacgcgggga ggcagacaag gtatagggcg 10740 
gcgcctacaa tccatgccaa cccgttccat gtgctcgccg aggcggcata aatcgccgtg 10800 
acgatcagcg gtccaatgat cgaagttagg ctggtaagag ccgcgagcga tccttgaagc 10860 
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tgtccctgat 


ggtcgtcatc 


tacctgcctg 


gacagcatgg 


cctgcaacgc 


gggcatcccg 


10920 


atgccgccgg 


aagcgagaag 


aatcataatg 


gggaaggcca 


tccagcctcg 


cgtcgcgaac 


10980 


gccagcaaga 


cgtagcccag 


cgcgtcggcc 


gccatgccgg 


cgataatggc 


ctgcttctcg 


11040 


ccgaaacgtt 


tggtggcggg 


accagtgacg 


aaggcttgag 


cgagggcgtg 


caagattccg 


11100 


aataccgcaa 


gcgacaggcc 


gatcatcgtc 


gcgctccagc 


gaaagcggtc 


ctcgccgaaa 


11160 


atgacccaga 


gcgctgccgg 


cacctgtcct 


acgagttgca 


tgataaagaa 


gacagtcata 


11220 


agtgcggcga 


cgatagtcat 


gccccgcgcc 


caccggaagg 


agctgactgg 


gttgaaggct 


11280 


ctcaagggca 


tcggtcgagg 


atccttcaat 


atgcgcacat 


acgctgttat 


gttcaaggtc 


11340 


ccttcgttta 


agaacgaaag 


cggtcttcct 


tttgagggat 


gtttcaagtt 


gttcaaatct 


11400 


atcaaatttg 


caaatcccca 


gtctgtatct 


agagcgttga 


atcggtgatg 


cgatttgtta 


11460 


attaaattga 


tggtgtcacc 


attaccaggt 


ctagatatac 


caatggcaaa 


ctgagcacaa 


11520 


caataccagt 


ccggatcaac 


tggcaccatc 


tctcccgtag 


tctcatctaa 


tttttcttcc 


11580 


ggatgaggtt 


ccagatatac 


cgcaacacct 


ttattatggt 


ttccctgagg 


gaataataga 


11640 


atgtcccatt 


cgaaatcacc 


aattctaaac 


ctgggcgaat 


tgtatttcgg 


gtttgttaac 


11700 


tcgttccagt 


caggaatgtt 


ccacgtgaag 


ctatcttcca 


gcaaagtctc 


cacttcttca 


11760 


tcaaattgtg 


gagaatactc 


ccaatgctct 


tatctatggg 


acttccggga 


aacacagtac 


11820 


cgatacttcc 


caattcgtct 


tcagagctca 


ttgtttgttt 


gaagagacta 


atcaaagaat 


11880 


cgttttctca 


aaaaaattaa 


tatcttaact 


gatagtttga 


tcaaaggggc 


aaaacgtagg 


11940 


ggcaaacaaa 


cggaaaaatc 


gtttctcaaa 


ttttctgatg 


ccaagaactc 


taaccagtct 


12000 


tatctaaaaa 


ttgccttatg 


atccgtctct 


ccggttacag 


cctgtgtaac 


tgattaatcc 


12060 


tgcctttcta 


atcaccattc 


taatgtttta 


attaagggat 


tttgtcttca 


ttaacggctt 


12120 


tcgctcataa 


aaatgttatg 


acgttttgcc 


cgcaggcggg 


aaaccatcca 


cttcacgaga 


12180 


ctgatctcct 


ctgccggaac 


accgggcatc 


tccaacttat 


aagttggaga 


aataagagaa 


12240 


tttcagattg 


agagaatgaa 


aaaaaaaaac 


ccttagttca 


taggtccatt 


ctcttagcgc 


12300 


aactacagag 


aacaggggca 


caaacaggca 


aaaaacgggc 


acaacctcaa 


tggagtgatg 


12360 


caacctgcct 


ggagtaaatg 


atgacacaag 


gcaattgacc 


cacgcatgta 


tctatctcat 


12420 


tttcttacac 


cttctattac 


cttctgctct 


ctctgatttg 


gaaaaagctg 


aaaaaaaagg 


12480 


ttgaaaccag 


ttccctgaaa 


ttattcccct 


acttgactaa 


taagtatata 


aagacggtag 


12540 


gtattgattg 


taattctgta 


aatctatttc 


ttaaacttct 


taaattctac 


ttttatagtt 


12600 


agtctttttt 


ttagttttaa 


aacaccaaga 


acttagtttc 


gaataaacac 


acataaacaa 


12660 
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acaagcttac aaaacaaa atg get gca tat gca get cag ggc tat aag gtg 12711 

Met Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val 
15 10 

eta gta etc aac ccc tct gtt get gca aea ctg ggc ttt ggt get tac 12759 
Leu Val Leu Asn Pro Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr 
15 20 25 

atg tec aag get cat ggg ate gat cct aac ate agg ace ggg gtg aga 12807 
Met Ser Lys Ala His Gly He Asp Pro Asn He Arg Thr Gly Val Arg 
30 35 40 

aca att acc act ggc age cce ate aeg tac tec aec tac ggc aag ttc 12855 
Thr He Thr Thr Gly Ser Pro He Thr Tyr Ser Thr Tyr Gly Lys Phe 
45 50 55 

ctt gee gae ggc ggg tgc teg ggg ggc get tat gac ata ata att tgt 12 903 
Leu Ala Asp Gly Gly Cys Ser Gly Gly Ala Tyr Asp He He He Cys 
60 65 70 75 

gac gag tgc cac tee acg gat gee aca tec ate ttg ggc att ggc act 12951 
Asp Glu Cys His Ser Thr Asp Ala Thr Ser He Leu Gly He Gly Thr 
80 85 90 

gtc ctt gac caa gca gag act geg ggg gcg aga ctg gtt gtg etc gee 12 999 
Val Leu Asp Gin Ala Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala 
95 100 105 

acc gee acc cct ccg ggc tec gtc act gtg cce cat ecc aac ate gag 13047 
Thr Ala Thr Pro Pro Gly Ser Val Thr Val Pro His Pro Asn He Glu 
110 lis 120 

gag gtt get ctg tec acc aec gga gag ate cct ttt tac ggc aag get 13095 
Glu Val Ala Leu Ser Thr Thr Gly Glu He Pro Phe Tyr Gly Lys Ala 
125 130 135 

ate ecc etc gaa gta ate aag ggg ggg aga cat etc ate ttc tgt cat 13143 
He Pro Leu Glu Val He Lys Gly Gly Arg His Leu He Phe Cys His 
140 145 ISO ISS 

tea aag aag aag tgc gac gaa etc gee gca aag ctg gtc gca ttg ggc 13191 
Ser Lys Lys Lys Cys Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly 
160 165 170 

ate aat gee gtg gcc tac tac cgc ggt ctt gac gtg tee gtc ate eeg 13239 
He Asn Ala Val Ala Tyr Tyr Arg Gly Leu Asp Val Ser Val He Pro 
175 180 185 

acc age ggc gat gtt gtc gtc gtg gca acc gat gcc etc atg acc ggc 13287 
Thr Ser Gly Asp Val Val Val Val Ala Thr Asp Ala Leu Met Thr Gly 
190 195 200 

tat acc ggc gae ttc gac teg gtg ata gac tgc aat acg tgt gtc aec 13335 
Tyr Thr Gly Asp Phe Asp Ser Val He Asp Cys Asn Thr Cys Val Thr 
205 210 215 

cag aca gtc gat ttc age ctt gae cct acc ttc aec att gag aca ate 133 83 
Gin Thr Val Asp Phe Ser Leu Asp Pro Thr Phe Thr He Glu Thr He 
220 225 230 235 
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acg etc ccc caa gat get gtc tec cgc act caa cgt egg ggc agg act 13431 
Thr Leu Pro Gin Asp Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr 
240 245 250 

gge agg ggg aag eca ggc ate tae aga ttt gtg gca ccg ggg gag cgc 13479 
Gly Arg Gly Lys Pro Gly lie Tyr Arg Phe Val Ala Pro Gly Glu Arg 
255 260 265 

cce tec ggc atg tte gae teg tec gtc etc tgt gag tgc tat gae gca 13527 
Pro Ser Gly Met Phe Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala 
270 275 280 

ggc tgt get tgg tat gag etc acg ccc gcc gag act aca gtt agg eta 13575 
Gly Cys Ala Trp Tyr Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu 
285 290 295 

cga gcg tae atg aac ace ccg ggg ctt ccc gtg tgc eag gae cat ctt 1362 3 
Arg Ala Tyr Met Asn Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu 
300 305 310 315 

gaa ttt tgg gag ggc gtc ttt aca ggc etc act eat ata gat gee cac 13671 
Glu Phe Trp Glu Gly Val Phe Thr Gly Leu Thr His He Asp Ala His 
320 325 330 

ttt eta tec eag aca aag eag agt ggg gag aac ctt cet tae etg gta 13719 
Phe Leu Ser Gin Thr Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val 
335 340 345 

gcg tae caa gee ace gtg tgc get agg get eaa gee ect ccc eca teg 13767 
Ala Tyr Gin Ala Thr Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser 
350 355 360 

tgg gae eag atg tgg aag tgt ttg att cgc etc aag eec ace etc eat 13815 
Trp Asp Gin Met Trp Lys Cys Leu He Arg Leu Lys Pro Thr Leu His 
365 370 375 

ggg eca aca ccc ctg eta tae aga etg ggc get gtt eag aat gaa ate 13863 
Gly Pro Thr Pro Leu Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu ile 
380 385 390 395 

ace ctg acg cae eca gtc ace aaa tae ate atg aca tgc atg teg gee 13911 
Thr Leu Thr His Pro Val Thr Lys Tyr lie Met Thr Cys Met Ser Ala 
400 405 410 

gae ctg gag gtc gtc acg age ace tgg gtg etc gtt ggc ggc gtc ctg 13959 
Asp Leu Glu Val Val Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu 
415 420 425 

get get ttg gee gcg tat tgc ctg tea aca gge tgc gtg gtc ata gtg 14007 
Ala Ala Leu Ala Ala Tyr Cys Leu Ser Thr Gly Cys Val Val lie Val 
430 435 440 

ggc agg gtc gtc ttg tec ggg aag ccg gca ate ata ect gae agg gaa 14055 
Gly Arg Val Val Leu Ser Gly Lys Pro Ala Ile lie Pro Asp Arg Glu 
445 450 455 

gtc etc tae cga gag tte gat gag atg gaa gag tgc tct eag cac tta 14103 
Val Leu Tyr Arg Glu Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu 
460 465 470 475 
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ccg tac ate gag caa ggg atg atg etc gee gag cag ttc aag cag aag 14151 
Pro Tyr He Glu Gin Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys 
480 485 490 

gee cte ggc cte ctg cag ace gcg tec cgt cag gca gag gtt ate gcc 14199 
Ala Leu Gly Leu Leu Gin Thr Ala Sex Arg Gin Ala Glu Val He Ala 
495 500 505 

cet get gtc cag acc aac tgg caa aaa etc gag ace ttc tgg gcg aag 14247 
Pro Ala Val Gin Thr Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys 
510 515 520 

cat atg tgg aac ttc ate agt ggg ata caa tac ttg gcg ggc ttg tea 14295 
His Met Trp Asn Phe He Ser Gly He Gin Tyr Leu Ala Gly Leu Ser 
525 530 535 

acg ctg ect ggt aac ccc gcc att get tea ttg atg get ttt aca get 14343 
Thr Leu Pro Gly Asn Pro Ala He Ala Ser Leu Met Ala Phe Thr Ala 
540 545 550 555 

get gtc ace age eea eta acc act age caa acc etc cte ttc aac ata 14391 
Ala Val Thr Ser Pro Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn lie 
560 565 570 

ttg ggg ggg tgg gtg get gee cag etc gcc gcc ccc ggt gee get act 14439 
Leu Gly Gly Trp Val Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr 
575 580 585 

gcc ttt gtg ggc get ggc tta get ggc gcc gcc ate ggc agt gtt gga 14487 
Ala Phe Val Gly Ala Gly Leu Ala Gly Ala Ala He Gly Ser Val Gly 
590 595 600 

Ctg ggg aag gtc etc ata gae ate ett gca ggg tat ggc gcg ggc gtg 14535 
Leu Gly Lys Val Leu He Asp He Leu Ala Gly Tyr Gly Ala Gly Val 
605 610 615 

9cg gga get ett gtg gca ttc aag ate atg age ggt gag gtc ccc tec 14583 
Ala Gly Ala Leu Val Ala Phe Lys He Met Ser Gly Glu Val Pro Ser 
620 625 630 635 

acg gag gae ctg gtc aat eta ctg eee gcc ate etc teg ccc gga gee 14631 
Thr Glu Asp Leu Val Asn Leu Leu Pro Ala He Leu Ser Pro Gly Ala 
640 645 650 

etc gta gtc ggc gtg gtc tgt gca gca ata ctg cge egg cac gtt ggc 1467 9 
Leu Val Val Gly Val Val Cys Ala Ala He Leu Arg Arg His Val Gly 
655 660 665 

ccg ggc gag ggg gca gtg cag tgg atg aac egg ctg ata gee ttc gcc 14727 
Pro Gly Glu Gly Ala Val Gin Trp Met Asn Arg Leu He Ala Phe Ala 
670 675 680 

tec egg ggg aac eat gtt tec ccc acg cac tac gtg ccg gag age gat 14775 
Ser Arg Gly Asn His Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp 
685 690 695 

gca get gee cge gtc act gcc ata etc age age etc act gta acc cag 14823 
Ala Ala Ala Arg Val Thr Ala He Leu Ser Ser Leu Thr Val Thr Gin 
700 705 710 715 
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etc ctg agg cga ctg cac cag tgg ata age teg gag tgt acc act cca 14871 
Leu Leu Arg Arg Leu His Gin Trp lie Ser Ser Glu Cys Thr Thr Pro 
720 725 730 

tgc tec ggt tec tgg eta agg gae ate tgg gac tgg ata tgc gag gtg 14919 
Cys Ser Gly Ser Trp Leu Arg Asp He Trp Asp Trp He Cys Glu Val 
735 740 745 

ttg age gac ttt aag acc tgg eta aaa get aag etc atg cca cag ctg 14967 
Leu Ser Asp Phe Lys Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu 
750 755 760 

cet ggg ate ccc ttt gtg tec tgc cag egc ggg tat aag ggg gte tgg 15015 
Pro Gly He Pro Phe Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp 
765 770 775 

cga ggg gac ggc ate atg cae act cgc tgc cac tgt gga get gag ate 15063 
Arg Gly Asp Gly He Met His Thr Arg Cys His Cys Gly Ala Glu He 
780 785 790 795 

act gga cat gtc aaa aac ggg acg atg agg ate gte ggt cet agg ace 15111 
Thr Gly His Val Lys Asn Gly Thr Met Arg He Val Gly Pro Arg Thr 
800 805 810 

tgc agg aac atg tgg agt ggg acc tte ccc att aat gee tac ace acg 15159 
Cys Arg Asn Met Trp Ser Gly Thr Phe Pro He Asn Ala Tyr Thr Thr 
815 820 825 

ggc ccc tgt acc ccc ctt cet gcg ecg aac tac acg tte gcg eta tgg 15207 
Gly Pro Cys Thr Pro Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp 
830 835 840 

agg gtg tct gca gag gaa tac gtg gag ata agg eag gtg ggg gac tte 15255 
Arg Val Ser Ala Glu Glu Tyr Val Glu He Arg Gin Val Gly Asp Phe 
845 850 855 

cac tac gtg acg ggt atg act act gae aat ctt aaa tgc ecg tgc cag 15303 
His Tyr Val Thr Gly Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin 
860 865 870 875 

gtc cca teg ccc gaa ttt tte aca gaa ttg gac ggg gtg cgc eta eat 15351 
Val Pro Ser Pro Glu Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His 
880 885 890 

agg ttt gcg ccc cec tgc aag ccc ttg ctg egg gag gag gta tea tte 15399 
Arg Phe Ala Pro Pro Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe 
895 900 905 

aga gta gga etc cac gaa tac ecg gta ggg teg caa tta cet tgc gag 15447 
Arg Val Gly Leu His Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu 
910 915 920 

cec gaa ecg gae gtg gee gtg ttg acg tec atg etc act gat ccc tec 15495 
Pro Glu Pro Asp Val Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser 
925 930 935 

eat ata aca gca gag gcg gee ggg cga agg ttg gcg agg gga tea ccc 15543 
His He Thr Ala Glu Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro 
940 945 950 955 
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ccc tct gtg gcc age tec teg get age cag eta tee get cea tet etc 15591 
Pro Ser Val Ala Ser Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu 
960 965 970 

aag gea act tgc acc get aac cat gae tee eet gat get gag etc ata 15639 
Lys Ala Thr Cys Thr Ala Asn His Asp Ser Pro Asp Ala Glu Leu He 
975 980 985 

gag gee aac etc eta tgg agg cag gag atg gge gge aac ate acc agg 15687 
Glu Ala Asn Leu Leu Trp Arg Gin Glu Met Gly Gly Asn He Thr Arg 
990 995 1000 

gtt gag tea gaa aac aaa gtg gtg att ctg gac tec ttc gat ccg ett 15735 
Val Glu Ser Glu Asn Lys Val Val He Leu Asp Ser Phe Asp Pro Leu 
1005 1010 1015 

gtg gcg gag gag gac gag egg gag ate tec gta ccc gea gaa ate ctg 15783 
Val Ala Glu Glu Asp Glu Arg Glu He Ser Val Pro Ala Glu He Leu 
1020 1025 1030 1035 

egg aag tct egg aga ttc gee cag gcc ctg ccc gtt tgg gcg egg eeg 15831 
Arg Lys Ser Arg Arg Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro 
1040 1045 1050 

gac tat aac ccc ccg eta gtg gag acg tgg aaa aag ccc gac tac gaa 15379 
Asp Tyr Asn Pro Pro Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu 
1055 1060 1065 

cea cct gtg gtc cat gge tge ccg ett cea cet eca aag tec cet eet 15927 
Pro Pro Val Val His Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro 
1070 1075 1080 

gtg eet ccg eet egg aag aag egg acg gtg gtc etc act gaa tea ace 15975 
Val Pro Pro Pro Arg Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr 
1085 1090 1095 

eta tct act gcc ttg gee gag etc gcc acc aga age ttt gge age tec 1602 3 
Leu Ser Thr Ala Leu Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser 
1100 1105 1110 1115 

tea act tec gge att acg gge gae aat acg aca aca tec tct gag ccc 16071 
Ser Thr Ser Gly He Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro 
1120 1125 1130 

gcc cet tct gge tge ccc cee gae tec gac get gag tee tat tec tec 16119 
Ala Pro Ser Gly Cys Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser 
1135 1140 1145 

atg cee ccc ctg gag ggg gag eet ggg gat ccg gat ett age gac ggg 16167 
Met Pro Pro Leu Glu Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly 
1150 1155 1160 

tea tgg tea acg gtc agt agt gag gcc aac gcg gag gat gtc gtg tge 16215 
Ser Trp Ser Thr Val Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys 
1165 1170 1175 

tgc tea atg tct tac tct tgg aca gge gea etc gtc acc ccg tgc gcc 16263 
Cys Ser Met Ser Tyr Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala 
1180 1185 1190 1195 
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gcg gaa gaa cag aaa ctg ccc ate aat gca eta age aac teg ttg eta 
Ala Glu Glu Gin Lys Leu Pro lie Asn Ala Leu Ser Asn Ser Leu Leu 
1200 1205 1210 



16311 



cgt eae cac aat ttg gtg tat tee ace acc tea cgc agt get tgc caa 
Arg His His Asn Leu Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin 
1215 1220 1225 



16359 



agg cag aag aaa gtc aca ttt gae aga ctg caa gtt ctg gac age cat 
Arg Gin Lys Lys Val Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His 
1230 1235 1240 



16407 



tac cag gac gta etc aag gag gtt aaa gca gcg gcg tea aaa gtg aag 
Tyr Gin Asp Val Leu Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys 
1245 1250 1255 



16455 



get aac ttg eta tec gta gag gaa get tgc age ctg acg ccc cea eae 
Ala Asn Leu Leu Ser Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His 
1260 1265 1270 1275 



16503 



tea gee aaa tec aag ttt ggt tat ggg gca aaa gac gtc cgt tgc cat 
Ser Ala Lys Ser Lys Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His 
1280 1285 1290 



16551 



gee aga aag gee gta ace cac ate aac tec gtg tgg aaa gac ctt ctg 
Ala Arg Lys Ala Val Thr His lie Asn Ser Val Trp Lys Asp Leu Leu 
1295 1300 1305 



16599 



gaa gac aat gta aca cea ata gac act acc ate atg get aag aac gag 
Glu Asp Asn Val Thr Pro lie Asp Thr Thr lie Met Ala Lys Asn Glu 
1310 1315 1320 



16647 



gtt ttc tgc gtt cag cct gag aag ggg ggt cgt aag cea get cgt etc 
Val Phe Cys Val Gin Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu 
1325 1330 1335 



16695 



ate gtg ttc ccc gat ctg ggc gtg cgc gtg tgc gaa aag atg get ttg 
He Val Phe Pro Asp Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu 
1340 1345 1350 1355 



16743 



tac gae gtg gtt aca aag etc ccc ttg gee gtg atg gga age tec tac 
Tyr Asp Val Val Thr Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr 
1360 1365 1370 



16791 



gga ttc caa tac tea cea gga cag egg gtt gaa ttc etc gtg caa gcg 
Gly Phe Gin Tyr Ser Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala 
1375 1380 1385 



16839 



tgg aag tee aag aaa acc cea atg ggg ttc teg tat gat acc cgc tgc 
Trp Lys Ser Lys Lys Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys 
1390 1395 1400 



16887 



ttt gac tee aca gtc act gag age gac ate cgt acg gag gag gca ate 
Phe Asp Ser Thr Val Thr Glu Ser Asp He Arg Thr Glu Glu Ala He 
1405 1410 1415 



16935 



tac caa tgt tgt gac etc gae ccc caa gee cgc gtg gee ate aag tec 
Tyr Gin Cys Cys Asp Leu Asp Pro Gin Ala Arg Val Ala He Lys Ser 
1420 1425 1430 1435 



16983 
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etc acc gag agg ctt tat gtt ggg ggc cct ctt acc aat tea agg ggg 
Leu Thr Glu Arg Leu Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly 
1440 1445 1450 



17031 



gag aac tgc ggc tat cgc agg tgc cgc gcg age ggc gta ctg aca act 
Glu Asn Cys Gly Tyr Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr 
1455 1460 1465 



17079 



age tgt ggt aac acc etc act tgc tac ate aag gee egg gea gee tgt 
Ser Cys Gly Asn Thr Leu Thr Cys Tyr lie Lys Ala Arg Ala Ala Cys 
1470 1475 1480 



17127 



cga gee gca ggg etc cag gae tgc acc atg etc gtg tgt ggc gac gae 
Arg Ala Ala Gly Leu Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp 
1485 1490 1495 



17175 



tta gte gtt ate tgt gaa age gcg ggg gtc cag gag gac gcg geg age 
Leu Val Val lie Cys Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser 
1500 1505 1510 1515 



17223 



ctg aga gee ttc aeg gag get atg acc agg tac tec gee cec cct ggg 
Leu Arg Ala Phe Thr Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly 
1520 1525 1530 



17271 



gac cec cca caa eca gaa tac gac ttg gag etc ata aca tea tgc tec 
Asp Pro Pro Gin Pro Glu Tyr Asp Leu Glu Leu lie Thr Ser Cys Ser 
1535 1540 1545 



17319 



tec aac gtg tea gtc gee eac gac ggc get gga aag agg gtc tac tac 
Ser Asn Val Ser Val Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr 
1550 1555 1560 



17367 



etc acc cgt gac cct aca acc cec etc gcg aga get geg tgg gag aca 
Leu Thr Arg Asp Pro Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr 
1565 1570 1575 



17415 



gca aga cae act cca gtc aat tee tgg eta ggc aac ata ate atg ttt 
Ala Arg His Thr Pro Val Asn Ser Trp Leu Gly Asn lie lie Met Phe 
1580 1585 1590 1595 



17463 



gee cec aca ctg tgg geg agg atg ata ctg atg ace cat ttc ttt age 
Ala Pro Thr Leu Trp Ala Arg Met lie Leu Met Thr His Phe Phe Ser 
1600 1605 1610 



17511 



gtc ctt ata gee agg gac eag ctt gaa cag gee etc gat tgc gag ate 
Val Leu lie Ala Arg Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu lie 
1615 1620 1625 



17559 



tac ggg gcc tgc tac tec ata gaa cca ctg gat eta cct cca ate att 
Tyr Gly Ala Cys Tyr Ser lie Glu Pro Leu Asp Leu Pro Pro He He 
1630 1635 1640 



17607 



caa aga etc cat ggc etc age gea ttt tea etc eac agt tac tet eca 
Gin Arg Leu His Gly Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro 
1645 1650 1655 



17655 



ggt gaa ate aat agg gtg gcc gca tgc etc aga aaa ctt ggg gta ccg 
Gly Glu He Asn Arg Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro 
1660 1665 1670 1675 



17703 
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ccc ttg cga get tgg aga cac egg gee egg age gte cgc get agg ctt 
Pro Leu Arg Ala Trp Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu 
1680 1685 1690 



17751 



etg gee aga gga ggc agg get gee ata tgt ggc aag tac ete tte aac 
Leu Ala Arg Gly Gly Arg Ala Ala lie Cys Gly Lys Tyr Leu Phe Asn 
1695 1700 1705 



17799 



tgg gca gta aga aca aag etc aaa etc act eca ata gcg gee get ggc 
Trp Ala Val Arg Thr Lys Leu Lys Leu Thr Pro He Ala Ala Ala Gly 
1710 1715 1720 



17847 



eag etg gac ttg tee ggc tgg tte aeg get ggc tac age ggg gga gac 
Gin Leu Asp Leu Ser Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp 
1725 1730 1735 



17895 



att tat eac age gtg tct cat gee egg ecc cgc tgg ate tgg ttt tge 
He Tyr His Ser Val Ser His Ala Arg Pro Arg Trp He Trp Phe Cys 
1740 1745 1750 1755 



17943 



eta etc etg ctt get gca ggg gta ggc ate tac etc etc eee aac cga 
Leu Leu Leu Leu Ala Ala Gly Val Gly He Tyr Leu Leu Pro Asn Arg 
1760 1765 1770 



17991 



atg age acg aat cet aaa cet eaa aga aag ace aaa cgt aae acc aac 
Met Ser Thr Asn Pro Lys Pro Gin Arg Lys Thr Lys Arg Asn Thr Asn 
1775 1780 1785 



18039 



egg egg eeg eag gae gtc aag tte ccg ggt ggc ggt cag ate gtt ggt 
Arg Arg Pro Gin Asp Val Lys Phe Pro Gly Gly Gly Gin He Val Gly 
1790 1795 1800 



18087 



gga gtt tac ttg ttg ccg cgc agg ggc cet aga ttg ggt gtg ege gcg 
Gly Val Tyr Leu Leu Pro Arg Arg Gly Pro Arg Leu Gly Val Arg Ala 
1805 1810 1815 



18135 



aeg aga aag act tee gag egg teg eaa cet cga ggt aga cgt eag cet 
Thr Arg Lys Thr Ser Glu Arg Ser Gin Pro Arg Gly Arg Arg Gin Pro 
1820 1825 1830 1835 



18183 



ate cce aag get cgt egg ccc gag ggc agg ace tgg get eag ecc ggg 
He Pro Lys Ala Arg Arg Pro Glu Gly Arg Thr Trp Ala Gin Pro Gly 
1840 1845 1850 



18231 



tae ect tgg ecc ete tat ggc aat gag ggc tgc ggg tgg gcg gga tgg 
Tyr Pro Trp Pro Leu Tyr Gly Asn Glu Gly Cys Gly Trp Ala Gly Trp 
1855 1860 1865 



18279 



etc etg tct ccc cgt ggc tct egg cet age tgg ggc cce aca gae eee 
Leu Leu Ser Pro Arg Gly Ser Arg Pro Ser Trp Gly Pro Thr Asp Pro 
1870 1875 1880 



18327 



egg cgt agg teg ege aat ttg ggt aag gtc ate gat ace ett acg tge 
Arg Arg Arg Ser Arg Asn Leu Gly Lys Val He Asp Thr Leu Thr Cys 
1885 1890 1895 



18375 



ggc tte gee gae etc atg ggg tae ata ccg etc gte ggc gee cet ett 
Gly Phe Ala Asp Leu Met Gly Tyr He Pro Leu Val Gly Ala Pro Leu 
1900 1905 1910 1915 



18423 
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gga ggc get gcc agg gcc ctg gcg cat ggc gtc egg gtt ctg gaa gac 18471 
Gly Gly Ala Ala Arg Ala Leu Ala His Gly Val Arg Val Leu Glu Asp 
1920 1925 1930 

ggc gtg aac tat gca aca ggg aac ctt cct ggt tgc tot taatagtcga 18520 
Gly Val Asn Tyr Ala Thr Gly Asn Leu Pro Gly Cys Ser 
1935 1940 



ctttgttccc 


actgtacttt 


tagctcgtac 


aaaatacaat 


atacttttca 


tttctccgta 


18580 


aacaacatgt 


tttcccatgt 


aatatccttt 


tctatttttc 


gttccgttac 


caactttaca 


18640 


catactttat 


atagctattc 


acttctatac 


actaaaaaac 


taagacaatt 


ttaattttgc 


18700 


tgcctgccat 


atttcaattt 


gttataaatt 


cctataattt 


atcctattag 


tagctaaaaa 


18760 


aagatgaatg 


tgaatcgaat 


cctaagagaa 


ttggatctga 


tccacaggac 


gggtgtggtc 


18820 


gccatgatcg 


cgtagtcgat 


agtggctcca 


agtagcgaag 


cgagcaggac 


tgggcggcgg 


18880 


ccaaagcggt 


cggacagtgc 


tccgagaacg 


ggtgcgcata 


gaaattgcat 


caacgcatat 


18940 


agcgctagca 


gcacgccata 


gtgactggcg 


atgctgtcgg 


aatggacgat 


atcccgcaag 


19000 


aggcccggca 


gtaccggcat 


aaccaagcct 


atgcctacag 


catccagggt 


gacggtgccg 


19060 


aggatgacga 


tgagcgcatt 


gttagatttc 


atacacggtg 


cctgactgcg 


ttagcaattt 


19120 


aactgtgata 


aactaccgca 


ttaaagcttt 


ttctttccaa 


tttttttttt 


ttcgtcatta 


19180 


taaaaatcat 


tacgaccgag 


attcccgggt 


aataactgat 


ataattaaat 


tgaagctcta 


19240 


atttgtgagt 


ttagtataca 


tgcatttact 


tataatacag 


ttttttagtt 


ttgctggccg 


19300 


catcttctca 


aatatgcttc 


ccagcctgct 


tttctgtaac 


gttcaccctc 


taccttagca 


19360 


tcccttccct 


ttgcaaatag 


tcctcttcca 


acaataataa 


tgtcagatcc 


tgtagagacc 


19420 


acatcatcca 


cggttctata 


ctgttgaccc 


aatgcgtctc 


ccttgtcatc 


taaacccaca 


19480 


ccgggtgtca 


taatcaacca 


atcgtaacct 


tcatctcttc 


cacccatgtc 


tctttgagca 


19S40 


ataaagccga 


taacaaaatc 


tttgtcgctc 


ttcgcaatgt 


caacagtacc 


cttagtatat 


19600 


tctccagtag 


atagggagcc 


cttgcatgac 


aattctgcta 


acatcaaaag 


gcctctaggt 


19660 


tcctttgtta 


cttcttctgc 


cgcctgcttc 


aaaccgctaa 


caatacctgg 


gcccaccaca 


19720 


ccgtgtgcat 


tcgtaatgtc 


tgcccattct 


gctattctgt 


atacacccgc 


agagtactgc 


19780 


aatttgactg 


tattaccaat 


gtcagcaaat 


tttctgtctt 


cgaagagtaa 


aaaattgtac 


19840 


ttggcggata 


atgcctttag 


cggcttaact 


gtgccctcca 


tggaaaaatc 


agtcaagata 


19900 


tccacatgtg 


tttttagtaa 


acaaattttg 


ggacctaatg 


cttcaactaa 


ctccagtaat 


19960 


tccttggtgg 


tacgaacatc 


caatgaagca 


cacaagtttg 


tttgcttttc 


gtgcatgata 


20020 


ttaaatagct 


tggcagcaac 


aggactagga 


tgagtagcag 


cacgttcctt 


atatgtagct 


20080 
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ttcgacatga tttatcttcg tttcctgcag gtttttgttc tgtgcagttg ggttaagaat 20140 

actgggcaat ttcatgtttc ttcaacacta catatgcgta tatataccaa tctaagtctg 20200 

tgctccttcc ttcgttcttc cttctgttcg gagattaccg aatcaaaaaa atttcaagga 20260 

aaccgaaatc aaaaaaaaga ataaaaaaaa aatgatgaat tgaaaagctt atcgat 20316 



<210> 15 

<211> 1944 

<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 
pd . delta .NS3NS5 . pj . corel73 

<400> 15 

Met Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val Leu Val Leu Asn Pro 
15 10 15 

Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr Met Ser Lys Ala His 
20 25 30 

Gly lie Asp Pro Asn He Arg Thr Gly Val Arg Thr He Thr Thr Gly 
35 40 45 

Ser Pro He Thr Tyr Ser Thr Tyr Gly Lys Phe Leu Ala Asp Gly Gly 
50 55 60 

Cys Ser Gly Gly Ala Tyr Asp He He He Cys Asp Glu Cys His Ser 
65 70 75 80 

Thr Asp Ala Thr Ser He Leu Gly He Gly Thr Val Leu Asp Gin Ala 
85 90 95 

Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala Thr Ala Thr Pro Pro 
100 105 110 

Gly Ser Val Thr Val Pro His Pro Asn He Glu Glu Val Ala Leu Ser 
115 120 125 

Thr Thr Gly Glu He Pro Phe Tyr Gly Lys Ala He Pro Leu Glu Val 
130 135 140 

He Lys Gly Gly Arg His Leu He Phe Cys His Ser Lys Lys Lys Cys 
145 150 155 160 

Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly He Asn Ala Val Ala 
165 170 175 

Tyr Tyr Arg Gly Leu Asp Val Ser Val He Pro Thr Ser Gly Asp Val 
180 185 190 

Val Val Val Ala Thr Asp Ala Leu Met Thr Gly Tyr Thr Gly Asp Phe 
195 200 205 

Asp Ser Val He Asp Cys Asn Thr Cys Val Thr Gin Thr Val Asp Phe 
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210 215 220 

Ser Leu Asp Pro Thr Phe Thr lie Glu Thr He Thr Leu Pro Gin Asp 
225 230 235 240 

Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr Gly Arg Gly Lys Pro 
245 250 255 

Gly He Tyr Arg Phe Val Ala Pro Gly Glu Arg Pro Ser Gly Met Phe 
260 265 270 

Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala Gly Cys Ala Trp Tyr 
275 280 285 

Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu Arg Ala Tyr Met Asn 
290 295 300 

Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu Glu Phe Trp Glu Gly 
305 310 315 320 

Val Phe Thr Gly Leu Thr His lie Asp Ala His Phe Leu Ser Gin Thr 
325 330 335 

Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val Ala Tyr Gin Ala Thr 
340 345 350 

Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser Trp Asp Gin Met Trp 
355 360 365 

Lys Cys Leu He Arg Leu Lys Pro Thr Leu His Gly Pro Thr Pro Leu 
370 375 380 

Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu He Thr Leu Thr His Pro 
385 390 395 400 

Val Thr Lys Tyr He Met Thr Cys Met Ser Ala Asp Leu Glu Val Val 
405 410 415 

Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu Ala Ala Leu Ala Ala 
420 425 430 

Tyr Cys Leu Ser Thr Gly Cys Val Val He Val Gly Arg Val Val Leu 
435 440 445 

Ser Gly Lys Pro Ala He He Pro Asp Arg Glu Val Leu Tyr Arg Glu 
450 455 460 

Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu Pro Tyr He Glu Gin 
465 470 475 480 

Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys Ala Leu Gly Leu Leu 
485 490 495 

Gin Thr Ala Ser Arg Gin Ala Glu Val He Ala Pro Ala Val Gin Thr 
500 505 510 

Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys His Met Trp Asn Phe 
515 520 525 



126 



wo 01/38360 



PCT/USOO/32326 



lie Ser Gly lie Gin Tyr Leu Ala Gly Leu Ser Thr Leu Pro Gly Asn 
530 535 540 

Pro Ala lie Ala Ser Leu Met Ala Phe Thr Ala Ala Val Thr Ser Pro 
545 550 555 560 

Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn lie Leu Gly Gly Trp Val 
565 570 575 

Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr Ala Phe Val Gly Ala 
580 585 590 

Gly Leu Ala Gly Ala Ala lie Gly Ser Val Gly Leu Gly Lys Val Leu 
595 600 605 

He Asp He Leu Ala Gly Tyr Gly Ala Gly Val Ala Gly Ala Leu Val 
610 615 620 

Ala Phe Lys He Met Ser Gly Glu Val Pro Ser Thr Glu Asp Leu Val 
625 630 635 640 

Asn Leu Leu Pro Ala He Leu Ser Pro Gly Ala Leu Val Val Gly Val 
645 650 655 

Val Cys Ala Ala He Leu Arg Arg His Val Gly Pro Gly Glu Gly Ala 
660 665 670 

Val Gin Trp Met Asn Arg Leu He Ala Phe Ala Ser Arg Gly Asn His 
675 680 685 

Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp Ala Ala Ala Arg Val 
690 695 700 

Thr Ala He Leu Ser Ser Leu Thr Val Thr Gin Leu Leu Arg Arg Leu 
705 710 715 720 

His Gin Trp He Ser Ser Glu Cys Thr Thr Pro Cys Ser Gly Ser Trp 
725 730 735 

Leu Arg Asp He Trp Asp Trp He Cys Glu Val Leu Ser Asp Phe Lys 
740 745 750 

Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu Pro Gly He Pro Phe 
755 760 765 

Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp Arg Gly Asp Gly He 
770 775 780 

Met His Thr Arg Cys His Cys Gly Ala Glu He Thr Gly His Val Lys 
785 790 795 800 

Asn Gly Thr Met Arg He Val Gly Pro Arg Thr Cys Arg Asn Met Trp 
805 810 815 

Ser Gly Thr Phe Pro He Asn Ala Tyr Thr Thr Gly Pro Cys Thr Pro 
820 825 830 

Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp Arg Val Ser Ala Glu 
835 840 845 
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Glu Tyr Val Glu lie Arg Gin Val Gly Asp Phe His Tyr Val Thr Gly 
850 855 860 

Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin Val Pro Ser Pro Glu 
865 870 875 880 

Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His Arg Phe Ala Pro Pro 
885 890 895 

Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe Arg Val Gly Leu His 
900 905 910 

Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu Pro Glu Pro Asp Val 
915 920 925 

Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser His He Thr Ala Glu 
930 935 940 

Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro Pro Ser Val Ala Ser 
945 950 955 960 

Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu Lys Ala Thr Cys Thr 
965 970 975 

Ala Asn His Asp Ser Pro Asp Ala Glu Leu He Glu Ala Asn Leu Leu 
980 985 990 

Trp Arg Gin Glu Met Gly Gly Asn He Thr Arg Val Glu Ser Glu Asn 
995 1000 1005 

Lys Val Val He Leu Asp Ser Phe Asp Pro Leu Val Ala Glu Glu Asp 
1010 1015 1020 

Glu Arg Glu He Ser Val Pro Ala Glu He Leu Arg Lys Ser Arg Arg 
025 1030 1035 1040 

Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro Asp Tyr Asn Pro Pro 
1045 1050 1055 

Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu Pro Pro Val Val His 
1060 1065 1070 

Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro Val Pro Pro Pro Arg 
1075 1080 1085 

Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr Leu Ser Thr Ala Leu 
1090 1095 1100 

Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser Ser Thr Ser Gly He 
105 1110 1115 1120 

Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro Ala Pro Ser Gly Cys 
1125 1130 1135 

Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser Met Pro Pro Leu Glu 
1140 1145 1150 

Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly Ser Trp Ser Thr Val 
1155 1160 1165 
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Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys Cys Ser Met Ser Tyr 
1170 1175 1180 

Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala Ala Glu Glu Gin Lys 
185 1190 1195 1200 

Leu Pro He Asn Ala Leu Ser Asn Ser Leu Leu Arg His His Asn Leu 
1205 1210 1215 

Val Tyr Ser Thr Thr Ser Arg Ser Ala CYs Gin Arg Gin Lys Lys Val 
1220 1225 1230 

Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His Tyr Gin Asp Val Leu 
1235 1240 1245 

Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys Ala Asn Leu Leu Ser 
1250 1255 1260 

Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His Ser Ala Lys Ser Lys 
265 1270 1275 1280 

Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His Ala Arg Lys Ala Val 
1285 1290 1295 

Thr His He Asn Ser Val Trp Lys Asp Leu Leu Glu Asp Asn Val Thr 
1300 1305 1310 

Pro He Asp Thr Thr He Met Ala Lys Asn Glu Val Phe Cys Val Gin 
1315 1320 1325 

Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu He Val Phe Pro Asp 
1330 1335 1340 

Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu Tyr Asp Val Val Thr 
345 1350 1355 1360 

Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr Gly Phe Gin Tyr Ser 
1365 1370 1375 

Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala Trp Lys Ser Lys Lys 
1380 1385 1390 

Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys Phe Asp Ser Thr Val 
1395 1400 1405 

Thr Glu Ser Asp He Arg Thr Glu Glu Ala He Tyr Gin Cys Cys Asp 
1410 1415 1420 

Leu Asp Pro Gin Ala Arg Val Ala He Lys Ser Leu Thr Glu Arg Leu 
425 1430 1435 1440 

Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly Glu Asn Cys Gly Tyr 
1445 1450 1455 

Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr Ser Cys Gly Asn Thr 
1460 1465 1470 

Leu Thr Cys Tyr He Lys Ala Arg Ala Ala Cys Arg Ala Ala Gly Leu 
1475 1480 1485 
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Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp Leu Val Val lie Cys 
1490 1495 1500 

Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser Leu Arg Ala Phe Thr 
505 1510 1515 1520 

Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly Asp Pro Pro Gin Pro 
1525 1530 1535 

Glu Tyr Asp Leu Glu Leu lie Thr Ser Cys Ser Ser Asn Val Ser Val 
1540 1545 1550 

Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr Leu Thr Arg Asp Pro 
1555 1560 1565 

Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr Ala Arg His Thr Pro 
1570 1575 1580 

Val Asn Ser Trp Leu Gly Asn He He Met Phe Ala Pro Thr Leu Trp 
585 1590 1595 1600 

Ala Arg Met He Leu Met Thr His Phe Phe Ser Val Leu He Ala Arg 
1605 1610 1615 

Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu He Tyr Gly Ala Cys Tyr 
1620 1625 1630 

Ser He Glu Pro Leu Asp Leu Pro Pro He He Gin Arg Leu His Gly 
1635 1640 1645 



Leu Ser Ala Phe Ser Leu His Ser 
1650 1655 

Val Ala Ala Cys Leu Arg Lys Leu 
665 1670 



Tyr Ser Pro Gly Glu He Asn Arg 
1660 

Gly Val Pro Pro Leu Arg Ala Trp 
1675 1680 



Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu Leu Ala Arg Gly Gly 
1685 1690 1695 

Arg Ala Ala He Cys Gly Lys Tyr Leu Phe Asn Trp Ala Val Arg Thr 
1700 1705 1710 

Lys Leu Lys Leu Thr Pro He Ala Ala Ala Gly Gin Leu Asp Leu Ser 
1715 1720 1725 

Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp He Tyr His Ser Val 
1730 1735 1740 

Ser His Ala Arg Pro Arg Trp He Trp Phe Cys Leu Leu Leu Leu Ala 
745 1750 1755 1760 

Ala Gly Val Gly He Tyr Leu Leu Pro Asn Arg Met Ser Thr Asn Pro 
1765 1770 1775 

Lys Pro Gin Arg Lys Thr Lys Arg Asn Thr Asn Arg Arg Pro Gin Asp 
1780 1785 1790 

Val Lys Phe Pro Gly Gly Gly Gin He Val Gly Gly Val Tyr Leu Leu 
1795 1800 1805 
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Pro Arg Arg Gly Pro Arg Leu Gly Val Arg Ala Thr Arg Lys Thr Ser 
1810 1815 1820 

Glu Arg Ser Gin Pro Arg Gly Arg Arg Gin Pro He Pro Lys Ala Arg 
825 1830 1835 1840 

Arg Pro Glu Gly Arg Thr Trp Ala Gin Pro Gly Tyr Pro Trp Pro Leu 
1845 1850 1855 

Tyr Gly Asn Glu Gly Cys Gly Trp Ala Gly Trp Leu Leu Ser Pro Arg 
1860 1865 1870 

Gly Ser Arg Pro Ser Trp Gly Pro Thr Asp Pro Arg Arg Arg Ser Arg 
1875 1880 1885 

Asn Leu Gly Lys Val He Asp Thr Leu Thr Cys Gly Phe Ala Asp Leu 
1890 1895 1900 

Met Gly Tyr He Pro Leu Val Gly Ala Pro Leu Gly Gly Ala Ala Arg 
905 1910 1915 1920 

Ala Leu Ala His Gly Val Arg Val Leu Glu Asp Gly Val Asn Tyr Ala 
1925 1930 1935 

Thr Gly Asn Leu Pro Gly Cys Ser 
1940 



<210> 16 

<211> 20217 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 
pd, delta. NS3NS5 .pj .corel40 

<220> 

<221> CDS 

<222> (12679) . . (18411) 

<400> 16 



atcgatccta 


ccccttgcgc 


taaagaagta 


tatgtgccta 


ctaacgcttg tctttgtctc 


60 


tgtcactaaa 


cactggatta 


ttactcccag 


atacttattt 


tggactaatt taaatgattt 


120 


cggatcaacg 


ttcttaatat 


cgctgaatct 


tccacaattg 


atgaaagtag ctaggaagag 


180 


gaattggtat 


aaagtttttg 


tttttgtaaa 


tctcgaagta 


tactcaaacg aatttagtat 


240 


tttctcagtg 


atctcccaga 


tgctttcacc 


ctcacttaga 


agtgctttaa gcattttttt 


300 


actgtggcta 


tttcccttat 


ctgcttcttc 


cgatgattcg 


aactgtaatt gcaaactact 


360 


tacaatatca 


gtgatatcag 


attgatgttt 


ttgtccatag 


taaggaataa ttgtaaattc 


420 


ccaagcagga 


atcaatttct 


ttaatgaggc 


ttccagaatt 


gttgcttttt gcgtcttgta 


480 


tttaaactgg 


agtgatttat 


tgacaatatc 


gaaactcagc 


gaattgctta tgatagtatt 


540 
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atagctcatg 


aatgtggctc 


tcttgattgc 


ataggttagt 


tcagcagcac 


ataatgctat 


cacaaactga 


cgaacaagca 


ccttaggtgg 


gcttagcgcc 


gatcttgtgt 


gcaattgata 


tcttgcagta 


ttcaaacacg 


ctaactcgaa 


cgttctttcg 


aaaaatgcac 


cggccgcgca 


cggtatcttc 


atttcatatt 


ttaaaaatgc 


ggcccgcagg 


ttcgttttgc 


ggtactatct 


tgcacagatt 


ttataatgta 


ataagcaaga 


agaaaaccaa 


aatggacgac 


attgaaacag 


cttatagcgt 


ctgggatgta 


tgtcggctgt 


ttgatataga 


gagtaaacgt 


aagtctgatg 


ccatggaatc 


tctcacaacc 


ggtaggccgt 


gcgtatcttc 


tgactccagt 


gctgaggtaa 


ggtttgattc 


gattggaaat 


ggtatgctct 


atttgatgct 


acagaataac 


aagctgttag 


ctataataat 


aggaagattg 


cccgagaaag 


gaaaaatgga 


ttgtacacag 


ttattagtcc 


agctcgtaag 


cgtcgttacc 


caattgctta 


taataggtga 


tttattcatc 


ccggaatctc 


tggcggcaga 


gaatcgttta 


cagcaaaaaa 


accatgctaa 


tacaaatgaa 


gaagttccct 


caagaggagc 


atataaatta 


caaaacacca 


aaaaaaggag 


agtagcaacg 


agggtaaggg 


gatccaatat 


caaaggaaat 


gatagcattg 


cagcatatag 


aacagctaaa 


gggtagtgct 


gggataatat 


cacaggaggt 


actagactac 


gtacgcattt 


aagcataaac 


acgcactatg 


caacacgcag 


atataggtgc 


gacgtgaaca 


ttttcggaag 


cgctcgtttt 


cggaaacgct 



tgttccgtta tgtgtaatca tccaacataa 600 
tttctcacct gaaggtcttt caaacctttc 660 
tgttttacat aatatatcaa attgtggcat 720 
tctagtttca actactctat ttatcttgta 780 
aaactaactt taattgtcct gtttgtctcg 840 
ttatttgtac tgcgaaaata attggtactg 900 
acctttgctg cttttcctta atttttagac 960 
tgtgataaaa agttgttttg acatgtgatc 1020 
atacattatc aaacgaacaa tactggtaaa 1080 
ccaagaatct gacggtaaaa gcacgtacag 1140 
ttattgaaat gattgctcct gatgtagata 1200 
agctactctt tccaggatat gtcataaggc 1260 
c^tggtcttga ttctagcgca gaagattcca 1320 
ttttgcctgc tgcgaagatg gttaaggaaa 1380 
cttcacaaga agcaagtcag gctgccatag 1440 
acaatagaaa gcaactatac aaatctattg 1500 
acaagaagag agctaccgaa atgctcatga 1560 
caccagctcc aacggaagaa gatgttatga 1620 
ctttagttcc accagatcgt caagctgctt 1680 
taaaggatat attcaatagt ttcaatgaac 1740 
agagtgagtt ggaaggaagg actgaagtga 1800 
ccaggcgaac aagaagtaga gacacaaatg 1860 
tcactgaggg ccctaaagcg gttcccacga 1920 
gcagaaaatc acgtaatact tctagggtat 1980 
aaggatgaga ctaatccaat tgaggagtgg 2040 
gaaggaagca tacgataccc cgcatggaat 2100 
ctttcatcct acataaatag acgcatataa 2160 
ccgttcttct catgtatata tatatacagg 2220 
gtgagctgta tgtgcgcagc tcgcgttgca 22 80 
ttgaagttcc tattccgaag ttcctattct 2340 
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ctagaaagta taggaacttc agagcgcttt 
ttcaaaaaac caaaaacgca ccggactgta 
tccacaaaca ttgctcaaaa gtatctcttt 
aacctaccca tccacctttc gctccttgaa 
aggcttccaa tgctcttcaa attttactgt 
ctcttcataa tgtaagctta tctttatcga 
ctttacggtt ccctgagatt gaattagttc 
ctttgtacga cgaattttga ggttcgccat 
tattatctcc gcctcagttt gatcttccgc 
tatttcaccc cacaatcctt catccgcctc 
atgttgtaca ttgtttagtt cacgagaagg 
tatatgacct ttatcctgtt ctctttccac 
gcacctaata acattcttca aggcggagaa 
tgaaaacgtg agaatgaatt tagtattatt 
tcgaagataa gagaagaatg cagtgacctt 
aaaaaatacg cctttaggcc ttctgatacc 
attaatatct aaaccctctc cgatggtggc 
aaactgtgat aattctgggt gatttatgat 
aggatcaggc caatccagtt ctttttcaat 
tccaacaaat gcaaatgcta acgttttgta 
cccccttgtc gtctcgatta cacacctact 
cataatacat tgcttaatac aagcaagcag 
cattacagct gatgtcattg tatatcagcg 
tcgcggtttt tataaacaaa actttcgtta 
ttggaaattc gggaaaaagt agagcaacgc 
ttaacttcga gaagggatta aggctaattt 
ccattgaatg ccttataaaa cagctataga 
tttgtcaaag cttactgatg atgatgtgtc 
tgacattata aagctggcac ttagaattcc 
tctactgtac gatacacttc cgctcaggtc 



tgaaaaccaa aagcgctctg aagacgcact 2400 
acgagctact aaaatattgc gaataccgct 2460 
gctatatatc tctgtgctat atccctatat 2520 
cttgcatcta aactcgacct ctacatcaac 2580 
caagtagacc catacggctg taatatgctg 2640 
atcgtgtgaa aaactactac cgcgataaac 2700 
ctttagtata tgatacaaga cacttttgaa 2760 
cctctggcta tttccaatta tcctgtcggc 2820 
ttcagactgc catttttcac ataatgaatc 2880 
cgcatcttgt tccgttaaac tattgacttc 294 0 
gtcctcttca ggcggtagct cctgatctcc 3000 
aaacttagaa atgtattcat gaattatgga 3060 
gtttgggcca gatgcccaat atgcttgaca 312 0 
gtgatattct gaggcaattt tattataatc 3180 
tgtattgaca aatggagatt ccatgtatct 324 0 
ctttcccctg cggtttagcg tgccttttac 3300 
ctttaactga ctaataaatg caaccgatat 3360 
tcgatcgaca attgtattgt acactagtgc 342 0 
taccggtgtg tcgtctgtat tcagtacatg 3480 
tttcttataa ttgtcaggaa ctggaaaagt 3540 
ttcatcgtac accataggtt ggaagtgctg 3600 
tctctcgcca ttcatatttc agttattttc 3660 
ctgtaaaaat ctatctgtta cagaaggttt 372 0 
cgaaatcgag caatcacccc agctgcgtat 3780 
gagttgcatt ttttacacca taatgcatga 3840 
cactagtatg tttcaaaaac ctcaatctgt 3900 
ttgcatagaa gagttagcta ctcaatgctt 3960 
tactttcagg cgggtctgta gtaaggagaa 4020 
acggactata gactatacta gtatactccg 4080 
cttgtccttt aacgaggcct taccactctt 414 0 
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ttgttactct attgatccag ctcagcaaag 
tgtagtaaaa ctagctagac cgagaaagag 
gctgccatca ttattatccg atgtgacgct 
tttttttttt tttttttttt ttttttggta 
agcaaggatt ttcttaactt cttcggcgac 
accacctaaa tcaccagttc tgatacctgc 
ggctttacct tcttcaggca agttcaatga 
agtggcgata gggttgacct tattctttgg 
gtacaaacca aatgcggtgt tcttgtctgg 
acccaaggag cctgggataa cggaggcttc 
ggtgattata ataccattta ggtgggttgg 
aatcaattga tgttgaactt tcaatgtagg 
ttttctccat aatcttgaag aggccaaaac 
tggtggctca tgttgtaggg ccatgaaagc 
aacggtgtat tgttcactat cccaagcgac 
aaagtaaata cctcccacta attctctaac 
tggcttgatt ggagataagt ctaaaagaga 
9gcgtacaat tgaagttctt tacggatttt 
ggtaccccat ttaggaccac ccacagcacc 
ttccagcgcc tcatctggaa gtggaacacc 
atgattttcg aaatcgaact tgacattgga 
aatggcttcg gctgtgattt cttgaccaac 
aggggcagac attacaatgg tatatccttg 
aaaaaaaaaa atgcagcttc tcaatgatat 
tatccgacaa actgttttac agatttacga 
acatccgaac ctgggagttt tccctgaaac 
tatagtctag cgctttacgg aagacaatgt 
atctattgca taggtaatct tgcacgtcgc 
tgcacttcaa tagcatatct ttgttaacga 
atgcaacgcg agagcgctaa tttttcaaac 



gcagtgtgat ctaagattct atcttcgcga 4200 
actagaaatg caaaaggcac ttctacaatg 4260 
gcattttttt tttttttttt tttttttttt 4320 
caaatatcat aaaaaaagag aatcttttta 4380 
agcatcaccg acttcggtgg tactgttgga 4440 
atccaaaacc tttttaactg catcttcaat 4500 
caatttcaac atcattgcag cagacaagat 4560 
caaatctgga gcggaaccat ggcatggttc 4620 
caaagaggcc aaggacgcag atggcaacaa 4680 
atcggagatg atatcaccaa acatgttgct 4740 
gttcttaact aggatcatgg cggcagaatc 4 800 
gaattcgttc ttgatggttt cctccacagt 4860 
attagcttta tccaaggacc aaataggcaa 492 0 
ggccattctt gtgattcttt gcacttctgg 4980 
accatcacca tcgtcttcct ttctcttacc 504 0 
aacaacgaag tcagtacctt tagcaaattg 5100 
gtcggatgca aagttacatg gtcttaagtt 5160 
tagtaaacct tgttcaggtc taacactacc 5220 
taacaaaacg gcatcagcct tcttggaggc 52 80 
tgtagcatcg atagcagcac caccaattaa 534 0 
acgaacatca gaaatagctt taagaacctt 5400 
gtggtcacct ggcaaaacga cgatcttctt 5460 
aaatatatat aaaaaaaaaa aaaaaaaaaa 5520 
tcgaatacgc tttgaggaga tacagcctaa 5580 
tcgtacttgt tacccatcat tgaattttga 564 0 
agatagtata tttgaacctg tataataata 5700 
atgtatttcg gttcctggag aaactattgc 5760 
atccccggtt cattttctgc gtttccatct 5820 
agcatctgtg cttcattttg tagaacaaaa 5880 
aaagaatctg agctgcattt ttacagaaca 5940 
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gaaatgcaac gcgaaagcgc tattttacca 
caaaaatgca acgcgagagc gctaattttt 
gaacagaaat gcaacgcgag agcgctattt 
ttctacaaaa atgcatcccg agagcgctat 
tttctccttt gtgcgctcta taatgcagtc 
taaggttaga agaaggctac tttggtgtct 
cacttcccgc gtttactgat tactagcgaa 
atccccgatt atattctata ccgatgtgga 
gcgttgatga ttcttcattg gtcagaaaat 
atactacgta taggaaatgt ttacattttc 
tcttactaca atttttttgt ctaaagagta 
gtcgagttta gatgcaagtt caaggagcga 
agcacagaga tatatagcaa agagatactt 
aatattttag tagctcgtta cagtccggtg 
gagcgctttt ggttttcaaa agcgctctga 
tcggaatagg aacttcaaag cgtttccgaa 
tgcgcacata cagctcactg ttcacgtcgc 
tatacatgag aagaacggca tagtgcgtgt 
atttatgtag gatgaaaggt agtctagtac 
gtatcgtatg cttccttcag cactaccctt 
tggattagtc tcatccttca atgctatcat 
ccgagaaact agtgcgaagt agtgatcagg 
cctggccacg gcagaagcac gcttatcgct 
taggcccttc attgaaagaa atgaggtcat 
attttttata gcaaagattg aataaggcgc 
gactaagtta tcttttaata attggtattc 
atttactcgt tttaggactg gttcagaatt 
atcgatgata agctgtcaaa catgagaatt 
tatttttata ggttaatgtc atgataataa 
ggggaaatgt gcgcggaacc cctatttgtt 



acgaagaatc tgtgcttcat ttttgtaaaa 6000 
caaacaaaga atctgagctg catttttaca 6060 
taccaacaaa gaatctatac ttcttttttg 6120 
ttttctaaca aagcatctta gattactttt 6180 
tcttgataac tttttgcact gtaggtccgt 6240 
attttctctt ccataaaaaa agcctgactc 6300 
gctgcgggtg cattttttca agataaaggc 6360 
ttgcgcatac tttgtgaaca gaaagtgata 6420 
tatgaacggt ttcttctatt ttgtctctat 6480 
gtattgtttt cgattcactc tatgaatagt 6540 
atactagaga taaacataaa aaatgtagag 6600 
aaggtggatg ggtaggttat atagggatat 6660 
ttgagcaatg tttgtggaag cggtattcgc 6720 
cgtttttggt tttttgaaag tgcgtcttca 6780 
agttcctata ctttctagag aataggaact 684 0 
aacgagcgct tccgaaaatg caacgcgagc 6900 
acctatatct gcgtgttgcc tgtatatata 6960 
ttatgcttaa atgcgtactt atatgcgtct 7020 
ctcctgtgat attatcccat tccatgcggg 7080 
tagctgttct atatgctgcc actcctcaat 7140 
ttcctttgat attggatcat atgcatagta 7200 
tattgctgtt atctgatgag tatacgttgt 7260 
ccaatttccc acaacattag tcaactccgt 7320 
caaatgtctt ccaatgtgag attttgggcc 7380 
atttttcttc aaagctttat tgtacgatct 7440 
ctgtttattg cttgaagaat tgccggtcct 7500 
cctcaaaaat tcatccaaat atacaagtgg 7560 
cttgaagacg aaagggcctc gtgatacgcc 7620 
tggtttctta gacgtcaggt ggcacttttc 7680 
tatttttcta aatacattca aatatgtatc 7740 
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cgctcatgag acaataaccc tgataaatgc ttcaataata ttgaaaaagg aagagtatga 7800 
gtattcaaca tttccgtgtc gcccttattc ccttttttgc ggcattttgc cttcctgttt 7860 
ttgctcaccc agaaacgctg gtgaaagtaa aagatgctga agatcagttg ggtgcacgag 7920 
tgggttacat cgaactggat ctcaacagcg gtaagatcct tgagagtttt cgccccgaag 7980 
aacgttttcc aatgatgagc acttttaaag ttctgctatg tggcgcggta ttatcccgtg 8040 
ttgacgccgg gcaagagcaa ctcggtcgcc gcatacacta ttctcagaat gacttggttg 8100 
agtactcacc agtcacagaa aagcatctta cggatggcat gacagtaaga gaattatgca 8160 
gtgctgccat aaccatgagt gataacactg cggccaactt acttctgaca acgatcggag 8220 
gaccgaagga gctaaccgct tttttgcaca acatggggga tcatgtaact cgccttgatc 8280 
gttgggaacc ggagctgaat gaagccatac caaacgacga gcgtgacacc acgatgcctg 8340 
cagcaatggc aacaacgttg cgcaaactat taactggcga actacttact ctagcttccc 8400 
ggcaacaatt aatagactgg atggaggcgg ataaagttgc aggaccactt ctgcgctcgg 8460 
cccttccggc tggctggttt attgctgata aatctggagc cggtgagcgt gggtctcgcg 8520 
gtatcattgc agcactgggg ccagatggta agccctcccg tatcgtagtt atctacacga 8580 
cggggagtca ggcaactatg gatgaacgaa atagacagat cgctgagata ggtgcctcac 8640 
tgattaagca ttggtaactg tcagaccaag tttactcata tatactttag attgatttaa 8700 
aacttcattt ttaatttaaa aggatctagg tgaagatcct ttttgataat ctcatgacca 8760 
aaatccctta acgtgagttt tcgttccact gagcgtcaga ccccgtagaa aagatcaaag 8820 
gatcttcttg agatcctttt tttctgcgcg taatctgctg cttgcaaaca aaaaaaccac 8880 
cgctaccagc ggtggtttgt ttgccggatc aagagctacc aactcttttt ccgaaggtaa 8940 - 
ctggcttcag cagagcgcag ataccaaata ctgtccttct agtgtagccg tagttaggcc 9000 
accacttcaa gaactctgta gcaccgccta catacctcgc tctgctaatc ctgttaccag 9060 
tggctgctgc cagtggcgat aagtcgtgtc ttaccgggtt ggactcaaga cgatagttac 9120 
cggataaggc gcagcggtcg ggctgaacgg ggggttcgtg cacacagccc agcttggagc 9180 
gaacgaccta caccgaactg agatacctac agcgtgagct atgagaaagc gccacgcttc 9240 
ccgaagggag aaaggcggac aggtatccgg taagcggcag ggtcggaaca ggagagcgca 9300 
cgagggagct tccaggggga aacgcctggt atctttatag tcctgtcggg tttcgccacc 9360 
tctgacttga gcgtcgattt ttgtgatgct cgtcaggggg gcggagccta tggaaaaacg 9420 
ccagcaacgc ggccttttta cggttcctgg ccttttgctg gccttttgct cacatgttct 94 80 
ttcctgcgtt atcccctgat tctgtggata accgtattac cgcctttgag tgagctgata 9540 
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ccgctcgccg 


cagccgaacg 


accgagcgca 


gcgagtcagt gagcgaggaa 


gcggaagagc 


9600 


gcctgatgcg 


gtattttctc 


cttacgcatc 


tgtgcggtat ttcacaccgc 


atatggtgca 


9660 


ctctcagtac 


aatctgctct gatgccgcat 


agttaagcca gtatacactc 


cgctatcgct 


9720 


acgtgactgg 


gtcatggctg 


cgccccgaca 


cccgccaaca cccgctgacg cgccctgacg 


9780 


ggcttgtctg 


ctcccggcat 


ccgcttacag 


acaagctgtg accgtctccg ggagctgcat 


9840 


gtgtcagagg 


ttttcaccgt 


catcaccgaa 


acgcgcgagg cagctgcggt 


aaagctcatc 


9900 


agcgtggtcg 


tgaagcgatt 


cacagatgtc 


tgcctgttca tccgcgtcca 


gctcgttgag 


9960 


tttctccaga 


agcgttaatg 


tctggcttct 


gataaagcgg gccatgttaa gggcggtttt 


10020 


ttcctgtttg 


gtcactgatg 


cctccgtgta 


agggggattt ctgttcatgg gggtaatgat 


10080 


accgatgaaa 


cgagagagga 


tgctcacgat 


acgggttact gatgatgaac 


atgcccggtt 


10140 


actggaacgt 


tgtgagggta 


aacaactggc 


ggtatggatg cggcgggacc 


agagaaaaat 


10200 


cactcagggt 


caatgccagc 


gcttcgttaa 


tacagatgta ggtgttccac 


agggtagcca 


10260 


gcagcatcct 


gcgatgcaga 


tccggaacat 


aatggtgcag ggcgctgact 


tccgcgtttc 


10320 


cagactttac 


gaaacacgga 


aaccgaagac 


cattcatgtt gttgctcagg 


tcgcagacgt 


10380 


tttgcagcag 


cagtcgcttc 


acgttcgctc 


gcgtatcggt gattcattct 


gctaaccagt 


10440 


aaggcaaccc 


cgccagccta 


gccgggtcct 


caacgacagg agcacgatca 


tgcgcacccg 


10500 


tggccaggac 


ccaacgctgc 


ccgagatgcg 


ccgcgtgcgg ctgctggaga 


tggcggacgc 


10560 


gatggatatg 


ttctgccaag ggttggtttg 


cgcattcaca gttctccgca 


agaattgatt 


10620 


ggctccaatt 


cttggagtgg 


tgaatccgtt 


agcgaggtgc cgccggcttc 


cattcaggtc 


10680 


gaggtggccc 


ggctccatgc 


accgcgacgc 


aacgcgggga ggcagacaag 


gtatagggcg 


10740 


gcgcctacaa 


tccatgccaa 


cccgttccat 


gtgctcgccg aggcggcata 


aatcgccgtg 


10800 


acgatcagcg 


gtccaatgat 


cgaagttagg 


ctggtaagag ccgcgagcga 


tccttgaagc 


10860 


tgtccctgat 


ggtcgtcatc 


tacctgcctg 


gacagcatgg cctgcaacgc 


gggcatcccg 


10920 


atgccgccgg 


aagcgagaag aatcataatg 


gggaaggcca tccagcctcg cgtcgcgaac 


10980 


gccagcaaga 


cgtagcccag cgcgtcggcc 


gccatgccgg cgataatggc 


ctgcttctcg 


11040 


ccgaaacgtt 


tggtggcggg accagtgacg 


aaggcttgag cgagggcgtg 


caagattccg 


11100 


aataccgcaa 


gcgacaggcc gatcatcgtc 


gcgctccagc gaaagcggtc 


ctcgccgaaa 


11160 


atgacccaga 


gcgctgccgg cacctgtcct 


acgagttgca tgataaagaa gacagtcata 


11220 


agtgcggcga 


cgatagtcat 


gccccgcgcc 


caccggaagg agctgactgg gttgaaggct 


11280 


ctcaagggca 


tcggtcgagg atccttcaat 


atgcgcacat acgctgttat 


gttcaaggtc 


11340 
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ccttcgttta 


agaacgaaag 


cggtcttcct 


tttgagggat gtttcaagtt 


gttcaaatct 


11400 


atcaaatttg 


caaatcccca 


gtctgtatct 


agagcgttga 


atcggtgatg 


cgatttgtta 


11460 


attaaattga 


tggtgtcacc 


attaccaggt 


ctagatatac 


caatggcaaa 


ctgagcacaa 


11520 


caataccagt 


ccggatcaac 


tggcaccatc 


tctcccgtag 


tctcatctaa 


tttttcttcc 


11580 


ggatgaggtt 


ccagatatac 


cgcaacacct 


ttattatggt 


ttccctgagg 


gaataataga 


11640 


atgtcccatt 


cgaaatcacc 


aattctaaac 


ctgggcgaat 


tgtatttcgg 


gtttgttaac 


11700 


tcgttccagt 


caggaatgtt 


ccacgtgaag 


ctatcttcca 


gcaaagtctc 


cacttcttca 


11760 


tcaaattgtg 


gagaatactc 


ccaatgctct 


tatctatggg 


acttccggga 


aacacagtac 


11820 


cgatacttcc 


caattcgtct 


tcagagctca 


ttgtttgttt 


gaagagacta 


atcaaagaat 


11880 


cgttttctca 


aaaaaattaa 


tatcttaact 


gatagtttga 


tcaaaggggc 


aaaacgtagg 


11940 


ggcaaacaaa 


cggaaaaatc 


gtttctcaaa 


ttttctgatg 


ccaagaactc 


taaccagtct 


12000 


tatctaaaaa 


ttgccttatg 


atccgtctct 


ccggttacag 


cctgtgtaac 


tgattaatcc 


12060 


tgcctttcta 


atcaccattc 


taatgtttta 


attaagggat 


tttgtcttca 


ttaacggctt 


12120 


tcgctcataa 


aaatgttatg 


acgttttgcc 


cgcaggcggg 


aaaccatcca 


cttcacgaga 


12180 


ctgatctcct 


ctgccggaac 


accgggcatc 


tccaacttat 


aagttggaga 


aataagagaa 


12240 


tttcagattg 


agagaatgaa 


aaaaaaaaac 


ccttagttca 


taggtccatt 


ctcttagcgc 


12300 


aactacagag 


aacaggggca 


caaacaggca 


aaaaacgggc 


acaacctcaa 


tggagtgatg 


12360 


caacctgcct 


ggagtaaatg 


atgacacaag 


gcaattgacc 


cacgcatgta 


tctatctcat 


12420 


tttcttacac 


cttctattac 


cttctgctct 


ctctgatttg 


gaaaaagctg 


aaaaaaaagg 


12480 


ttgaaaccag 


ttccctgaaa 


ttattcccct 


acttgactaa 


taagtatata 


aagacggtag 


12540 


gtattgattg 


taattctgta 


aatctatttc 


ttaaacttct 


taaattctac 


ttttatagtt 


12600 


agtctttttt 


ttagttttaa 


aacaccaaga 


acttagtttc 


gaataaacac 


acataaacaa 


12660 


acaagcttac 


aaaacaaa atg get gca i 


tat gca get 


cag ggc tat aag gtg 


12711 



Met Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val 
15 10 



eta gta etc aae ecc tct gtt get gca aea ctg ggc ttt ggt get tac 12759 
Leu Val Leu Asn Pro Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr 
15 20 25 

atg tec aag get cat ggg ate gat cet aac ate agg aec ggg gtg aga 12807 
Met Ser Lys Ala His Gly lie Asp Pro Asn lie Arg Thr Gly Val Arg 
30 35 40 

aea att ace act ggc age cec ate acg tac tec aec tac ggc aag ttc 12855 
Thr lie Thr Thr Gly Ser Pro lie Thr Tyr Ser Thr Tyr Gly Lys Phe 
45 50 55 
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ctt gcc gac ggc ggg tgc teg ggg ggc get tat gac ata ata att tgt 12903 
Leu Ala Asp Gly Gly Cys Ser Gly Gly Ala Tyr Asp lie II He Cys 
60 65 70 75 

gac gag tgc cac tec acg gat gcc aca tec ate ttg ggc att gge act 12951 
Asp Glu Cys His Ser Thr Asp Ala Thr Ser He Leu Gly He Gly Thr 
80 85 90 

gtc ctt gac caa gca gag act gcg ggg geg aga ctg gtt gtg etc gcc 12999 
Val Leu Asp Gin Ala Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala 
95 100 105 

acc gcc acc ect ccg ggc tec gtc act gtg ccc cat ccc aac ate gag 13 04 7 
Thr Ala Thr Pro Pro Gly Ser Val Thr Val Pro His Pro Asn He Glu 
110 115 120 

gag gtt get ctg tec acc ace gga gag ate cct ttt tac gge aag get 13095 
Glu Val Ala Leu Ser Thr Thr Gly Glu He Pro Phe Tyr Gly Lys Ala 
125 130 135 

ate ccc etc gaa gta ate aag ggg ggg aga cat etc ate tte tgt eat 13143 
He Pro Leu Glu Val He Lys Gly Gly Arg His Leu He Phe Cys His 
140 145 150 155 

tea aag aag aag tgc gac gaa etc gcc gca aag ctg gtc gca ttg ggc 13191 
Ser Lys Lys Lys Cys Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly 
160 165 170 

ate aat gcc gtg gcc tac tac cgc ggt ctt gac gtg tee gtc ate ccg 13239 
He Asn Ala Val Ala Tyr Tyr Arg Gly Leu Asp Val Ser Val He Pro 
175 180 185 

acc age ggc gat gtt gtc gtc gtg gca ace gat gcc etc atg acc ggc 13287 
Thr Ser Gly Asp Val Val Val Val Ala Thr Asp Ala Leu Met Thr Gly 
190 195 200 

tat acc ggc gac tte gac teg gtg ata gac tgc aat acg tgt gtc acc 133 35 
Tyr Thr Gly Asp Phe Asp Ser Val He Asp Cys Asn Thr Cys Val Thr 
205 210 215 

cag aca gtc gat tte age ctt gac cct ace tte acc att gag aca ate 133 83 
Gin Thr Val Asp Phe Ser Leu Asp Pro Thr Phe Thr He Glu Thr He 
220 225 230 235 

acg etc ccc caa gat get gtc tee cgc act caa cgt egg ggc agg act 13431 
Thr Leu Pro Gin Asp Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr 
240 245 250 

ggc agg ggg aag eca ggc ate tac aga ttt gtg gca ccg ggg gag cgc 13479 
Gly Arg Gly Lys Pro Gly He Tyr Arg Phe Val Ala Pro Gly Glu Arg 
255 260 265 

ccc tec ggc atg tte gac teg tec gtc etc tgt gag tgc tat gac gca 13527 
Pro Ser Gly Met Phe Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala 
270 275 280 

ggc tgt get tgg tat gag etc acg ccc gcc gag act aca gtt agg eta 13575 
Gly Cys Ala Trp Tyr Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu 
285 290 295 
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cga gcg tac atg aac acc ccg ggg ctt ccc gtg tgc cag gac cat ctt 13623 
Arg Ala Tyr Met Asn Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu 
300 305 310 315 

gaa ttt tgg gag ggc gtc ttt aca ggc etc act cat ata gat gcc cac 13671 
Glu Phe Trp Glu Gly Val Phe Thr Gly Leu Thr His lie Asp Ala His 
320 325 330 

ttt eta tec cag aca aag cag agt ggg gag aac ctt cct tac ctg gta 13719 
Phe Leu Ser Gin Thr Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val 
335 340 345 

gcg tac caa gcc acc gtg tgc get agg get caa gcc ect ccc cca teg 13767 
Ala Tyr Gin Ala Thr Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser 
350 355 360 

tgg gac cag atg tgg aag tgt ttg att cgc etc aag ccc acc etc cat 13 815 
Trp Asp Gin Met Trp Lys Cys Leu lie Arg Leu Lys Pro Thr Leu His 
365 370 375 

ggg eca aca ccc ctg eta tac aga ctg ggc get gtt cag aat gaa ate 13863 
Gly Pro Thr Pro Leu Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu lie 
380 385 390 395 

ace ctg aeg cac cca gtc acc aaa tac ate atg aca tgc atg teg gcc 13911 
Thr Leu Thr His Pro Val Thr Lys Tyr lie Met Thr Cys Met Ser Ala 
400 405 410 

gac ctg gag gtc gtc aeg age acc tgg gtg etc gtt ggc ggc gtc ctg 13959 
Asp Leu Glu Val Val Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu 
415 420 425 

get get ttg gcc gcg tat tgc ctg tea aca ggc tgc gtg gtc ata gtg 14007 
Ala Ala Leu Ala Ala Tyr Cys Leu Ser Thr Gly Cys Val Val lie Val 
430 435 440 

ggc agg gtc gtc ttg tec ggg aag ccg gca ate ata cct gac agg gaa 14055 
Gly Arg Val Val Leu Ser Gly Lys Pro Ala He He Pro Asp Arg Glu 
445 450 455 

gtc etc tac cga gag tte gat gag atg gaa gag tgc tet cag cac tta 14103 
Val Leu Tyr Arg Glu Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu 
460 465 470 475 

ccg tac ate gag caa ggg atg atg etc gee gag cag tte aag cag aag 14151 
Pro Tyr He Glu Gin Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys 
480 485 490 

gee etc ggc etc ctg cag acc gcg tec cgt cag gca gag gtt ate gcc 14199 
Ala Leu Gly Leu Leu Gin Thr Ala Ser Arg Gin Ala Glu Val He Ala 
495 500 505 

cct get gtc cag ace aac tgg caa aaa etc gag acc tte tgg gcg aag 14247 
Pro Ala Val Gin Thr Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys 
510 515 520 

cat atg tgg aac tte ate agt ggg ata caa tac ttg gcg ggc ttg tea 142 95 
His Met Trp Asn Phe He Ser Gly He Gin Tyr Leu Ala Gly Leu Ser 
525 530 535 
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acg ctg cct ggt aac ccc gcc att get tea ttg atg get ttt aca get 14343 
Thr Leu Pro Gly Asn Pro Ala lie Ala Ser Leu Met Ala Phe Thr Ala 
540 545 550 555 

get gtc ace age cca eta ace act age eaa ace etc etc ttc aac ata 14391 
Ala Val Thr Ser Pro Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn lie 
560 565 570 

ttg ggg ggg tgg gtg get gcc eag etc gcc gcc ccc ggt gcc get act 14439 
Leu Gly Gly Trp Val Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr 
575 580 585 

gcc ttt gtg gge get ggc tta get gge gcc gcc ate ggc agt gtt gga 144 87 
Ala Phe Val Gly Ala Gly Leu Ala Gly Ala Ala lie Gly Ser Val Gly 
590 595 600 

ctg ggg aag gtc etc ata gae ate ett gca ggg tat gge gcg ggc gtg 14535 
Leu Gly Lys Val Leu lie Asp lie Leu Ala Gly Tyr Gly Ala Gly Val 
605 610 615 

geg gga get ctt gtg gca ttc aag ate atg age ggt gag gtc eee tec 14583 
Ala Gly Ala Leu Val Ala Phe Lys lie Met Ser Gly Glu Val Pro Ser 
620 625 630 635 

acg gag gac ctg gtc aat eta ctg ccc gee ate etc teg ccc gga gcc 14 631 
Thr Glu Asp Leu Val Asn Leu Leu Pro Ala lie Leu Ser Pro Gly Ala 
640 645 650 

etc gta gtc gge gtg gtc tgt gca gca ata ctg cgc egg eac gtt ggc 14679 
Leu Val Val Gly Val Val Cys Ala Ala lie Leu Arg Arg His Val Gly 
655 660 665 

ccg ggc gag ggg gca gtg cag tgg atg aac egg ctg ata gcc ttc gcc 14727 
Pro Gly Glu Gly Ala Val Gin Trp Met Asn Arg Leu lie Ala Phe Ala 
670 675 680 

tec egg ggg aac cat gtt tee ccc acg eac tac gtg ceg gag age gat 14775 
Ser Arg Gly Asn His Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp 
685 690 695 

gca get gcc cgc gtc act gcc ata etc age age etc act gta ace cag 14823 
Ala Ala Ala Arg Val Thr Ala lie Leu Ser Ser Leu Thr Val Thr Gin 
700 705 710 715 

etc ctg agg ega ctg eac cag tgg ata age teg gag tgt acc act cca 14871 
Leu Leu Arg Arg Leu His Gin Trp lie Ser Ser Glu Cys Thr Thr Pro 
720 725 730 

tge tec ggt tec tgg eta agg gac ate tgg gae tgg ata tgc gag gtg 14919 
Cys Ser Gly Ser Trp Leu Arg Asp lie Trp Asp Trp lie Cys Glu Val 
735 740 745 

ttg age gac ttt aag ace tgg eta aaa get aag etc atg cca cag ctg 14967 
Leu Ser Asp Phe Lys Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu 
750 755 760 

cct ggg ate ccc ttt gtg tee tgc cag cgc ggg tat aag ggg gtc tgg 15015 
Pro Gly He Pro Phe Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp 
765 770 775 



141 



wo 01/38360 



PCT/USOO/32326 



cga ggg gac ggc ate atg cac act cgc tgc cac tgt gga get gag ate 15063 
Arg Gly Asp Gly lie Met His Thr Arg Cys His Cys Gly Ala Glu lie 
780 785 790 795 

act gga cat gtc aaa aac ggg acg atg agg ate gtc ggt cet agg ace 15111 
Thr Gly His Val Lys Asn Gly Thr Met Arg lie Val Gly Pro Arg Thr 
800 805 810 

tgc agg aac atg tgg agt ggg acc ttc ccc att aat gee tac acc aeg 15159 
Cys Arg Asn Met Trp Ser Gly Thr Phe Pro He Asn Ala Tyr Thr Thr 
815 820 825 

ggc ccc tgt acc ccc ctt cct gcg ccg aac tac acg ttc gcg eta tgg 15207 
Gly Pro Cys Thr Pro Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp 
830 835 840 

agg gtg tct gea gag gaa tac gtg gag ata agg cag gtg ggg gac ttc 15255 
Arg Val Ser Ala Glu Glu Tyr Val Glu He Arg Gin Val Gly Asp Phe 
845 850 855 

cac tac gtg acg ggt atg act act gac aat ctt aaa tgc ccg tgc cag 15303 
His Tyr Val Thr Gly Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin 
860 865 870 875 

gtc cca teg ccc gaa ttt ttc aca gaa ttg gac ggg gtg cgc eta cat 15351 
Val Pro Ser Pro Glu Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His 
880 885 890 

agg ttt gcg ccc ccc tgc aag ccc ttg ctg egg gag gag gta tea ttc 15399 
Arg Phe Ala Pro Pro Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe 
895 900 905 

aga gta gga etc cac gaa tac ccg gta ggg teg caa tta cct tgc gag 15447 
Arg Val Gly Leu His Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu 
910 915 920 

ccc gaa ccg gac gtg gee gtg ttg acg tec atg etc act gat ccc tec 15495 
Pro Glu Pro Asp Val Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser 
925 930 935 

cat ata aca gca gag gcg gee ggg cga agg ttg gcg agg gga tea ccc 15543 
His He Thr Ala Glu Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro 
940 945 950 955 

ccc tct gtg gee age tec teg get age cag eta tec get cca tct etc 15591 
Pro Ser Val Ala Ser Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu 
960 965 970 

aag gca act tgc acc get aac cat gac tec cct gat get gag etc ata 15639 
Lys Ala Thr Cys Thr Ala Asn His Asp Ser Pro Asp Ala Glu Leu He 
975 980 985 

gag gee aac etc eta tgg agg cag gag atg ggc ggc aac ate ace agg 15687 
Glu Ala Asn Leu Leu Trp Arg Gin Glu Met Gly Gly Asn He Thr Arg 
990 995 1000 

gtt gag tea gaa aac aaa gtg gtg att ctg gac tec ttc gat ccg ctt 15735 
Val Glu Ser Glu Asn Lys Val Val He Leu Asp Ser Phe Asp Pro Leu 
1005 1010 1015 



142 



wo 01/38360 



PCT/USOO/32326 



gtg gcg gag gag gac gag egg gag ate tec gta ccc gca gaa ate ctg 
Val Ala Glu Glu Asp Glu Arg Glu He Ser Val Pro Ala Glu He Leu 
1020 1025 1030 1035 



15783 



egg aag tot egg aga ttc gee cag gee etg cee gtt tgg gcg egg ceg 
Arg Lys Ser Arg Arg Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro 
1040 1045 1050 



15831 



gac tat aac eee ceg eta gtg gag acg tgg aaa aag cee gae tac gaa 
Asp Tyr Asn Pro Pro Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu 
1055 1060 1065 



15879 



eca cet gtg gtc eat ggc tgc ecg ctt cea cet cea aag tec ect cet 
Pro Pro Val Val His Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro 
1070 1075 1080 



15927 



gtg cet ecg cet egg aag aag egg acg gtg gtc etc act gaa tea ace 
Val Pro Pro Pro Arg Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr 
1085 1090 1095 



15975 



eta tet act gee ttg gee gag etc gee ace aga age ttt ggc age tec 
Leu Ser Thr Ala Leu Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser 
1100 1105 1110 1115 



16023 



tea act tec gge att acg ggc gac aat acg aca aea tec tet gag ecc 
Ser Thr Ser Gly He Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro 
1120 1125 1130 



16071 



gee cet tet ggc tgc cee ccc gac tec gac get gag tec tat tec tec 
Ala Pro Ser Gly Cys Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser 
1135 1140 1145 



16119 



atg ccc ecc etg gag ggg gag cet ggg gat ecg gat ett age gae ggg 
Met Pro Pro Leu Glu Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly 
1150 1155 1160 



16167 



tea tgg tea acg gtc agt agt gag gee aac gcg gag gat gtc gtg tgc 
Ser Trp Ser Thr Val Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys 
1165 1170 1175 



16215 



tgc tea atg tet tac tet tgg aea ggc gca etc gtc ace ceg tgc gee 
Cys Ser Met Ser Tyr Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala 
1180 1185 1190 1195 



16263 



gcg gaa gaa cag aaa etg cee ate aat gca eta age aac teg ttg eta 
Ala Glu Glu Gin Lys Leu Pro He Asn Ala Leu Ser Asn Ser Leu Leu 
1200 1205 1210 



16311 



cgt eac cac aat ttg gtg tat tec ace ace tea cgc agt get tgc caa 
Arg His His Asn Leu Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin 
1215 1220 1225 



16359 



agg cag aag aaa gtc aca ttt gac aga etg caa gtt etg gac age cat 
Arg Gin Lys Lys Val Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His 
1230 1235 1240 



16407 



tac cag gae gta etc aag gag gtt aaa gca gcg gcg tea aaa gtg aag 
Tyr Gin Asp Val Leu Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys 
1245 1250 1255 



16455 
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get aac ttg eta tec gta gag gaa get tgc age ctg acg ccc cca cac 
Ala Asn Leu Leu Ser Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His 
1260 1265 1270 1275 



16503 



tea gee aaa tec aag ttt ggt tat ggg gca aaa gac gtc egt tgc cat 
Ser Ala Lys Ser Lys Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His 
1280 1285 1290 



16551 



gee aga aag gee gta acc cac ate aac tec gtg tgg aaa gac ctt etg 
Ala Arg Lys Ala Val Thr His lie Asn Ser Val Trp Lys Asp Leu Leu 
1295 1300 1305 



16599 



gaa gac aat gta aca cca ata gac act acc ate atg get aag aac gag 
Glu Asp Asn Val Thr Pro lie Asp Thr Thr lie Met Ala Lys Asn Glu 
1310 1315 1320 



16647 



gtt ttc tgc gtt cag cct gag aag ggg ggt egt aag cca get cgt etc 
Val Phe Cys Val Gin Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu 
1325 1330 1335 



16695 



ate gtg tte ccc gat ctg ggc gtg cgc gtg tgc gaa aag atg get ttg 
lie Val Phe Pro Asp Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu 
1340 1345 1350 1355 



16743 



tac gac gtg gtt aca aag etc ccc ttg gcc gtg atg gga age tee tae 
Tyr Asp Val Val Thr Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr 
1360 1365 1370 



16791 



gga ttc caa tac tea cca gga cag egg gtt gaa ttc etc gtg eaa gcg 
Gly Phe Gin Tyr Ser Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala 
1375 1380 1385 



16839 



tgg aag tec aag aaa acc cca atg ggg ttc teg tat gat acc cgc tgc 
Trp Lys Ser Lys Lys Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys 
1390 1395 1400 



16887 



ttt gac tee aca gtc act gag age gac ate cgt aeg gag gag gca ate 
Phe Asp Ser Thr Val Thr Glu Ser Asp lie Arg Thr Glu Glu Ala lie 
1405 1410 1415 



16935 



tac eaa tgt tgt gac etc gac ccc caa gcc cgc gtg gee ate aag tec 
Tyr Gin Cys Cys Asp Leu Asp Pro Gin Ala Arg Val Ala lie Lys Ser 
1420 1425 1430 1435 



16983 



etc ace gag agg ctt tat gtt ggg ggc ect ctt acc aat tea agg ggg 
Leu Thr Glu Arg Leu Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly 
1440 1445 1450 



17031 



gag aac tgc ggc tat cgc agg tgc cgc gcg age ggc gta ctg aca act 
Glu Asn Cys Gly Tyr Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr 
1455 1460 1465 



17079 



age tgt ggt aac acc etc act tgc tac ate aag gee egg gca gcc tgt 
Ser Cys Gly Asn Thr Leu Thr Cys Tyr He Lys Ala Arg Ala Ala Cys 
1470 1475 1480 



17127 



ega gcc gca ggg etc cag gac tgc acc atg etc gtg tgt ggc gac gac 
Arg Ala Ala Gly Leu Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp 
1485 1490 1495 



17175 
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tta gtc gtt ate tgt gaa age gcg ggg gtc cag gag gac gcg gcg age 
Leu Val Val He Cys Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser 
1500 1505 1510 1515 



17223 



ctg aga gee ttc acg gag get atg acc agg tac tec gcc ccc eet ggg 
Leu Arg Ala Phe Thr Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly 
1520 1525 1530 



17271 



gac cce cca caa cca gaa tac gac ttg gag etc ata aea tea tgc tec 
Asp -Pro Pro Gin Pro Glu Tyr Asp Leu Glu Leu He Thr Ser Cys Ser 
1535 1540 1545 



17319 



tec aac gtg tea gtc gee cae gac gge get gga aag agg gtc tac tac 
Ser Asn Val Ser Val Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr 
1550 1555 1560 



17367 



etc acc cgt gac cct aea ace ccc etc gcg aga get gcg tgg gag aea 
Leu Thr Arg Asp Pro Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr 
1565 1570 1575 



17415 



gca aga cac act cca gtc aat tee tgg eta ggc aac ata ate atg ttt 
Ala Arg His Thr Pro Val Asn Ser Trp Leu Gly Asn He He Met Phe 
1580 1585 1590 1595 



17463 



gcc ecc aca ctg tgg gcg agg atg ata ctg atg acc cat ttc ttt age 
Ala Pro Thr Leu Trp Ala Arg Met He Leu Met Thr His Phe Phe Ser 
1600 1605 1610 



17511 



gtc ctt ata gcc agg gac cag ctt gaa eag gcc etc gat tgc gag ate 
Val Leu He Ala Arg Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu He 
1615 1620 1625 



17559 



tac ggg gcc tgc tac tec ata gaa cca ctg gat eta cct cca ate att 
Tyr Gly Ala Cys Tyr Ser He Glu Pro Leu Asp Leu Pro Pro He He 
1630 1635 1640 



17607 



caa aga etc cat ggc etc age gca ttt tea etc cae agt tac tet cea 
Gin Arg Leu His Gly Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro 
1645 1650 1655 



17655 



ggt gaa ate aat agg gtg gcc gca tgc etc aga aaa ctt ggg gta ccg 
Gly Glu He Asn Arg Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro 
1660 1665 1670 1675 



17703 



ccc ttg cga get tgg aga cae egg gcc egg age gtc ege get agg ctt 
Pro Leu Arg Ala Trp Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu 
1680 1685 1690 



17751 



ctg gee aga gga ggc agg get gee ata tgt ggc aag tac etc ttc aac 
Leu Ala Arg Gly Gly Arg Ala Ala He Cys Gly Lys Tyr Leu Phe Asn 
1695 1700 1705 



17799 



tgg gca gta aga aea aag etc aaa etc act cca ata gcg gee get ggc 
Trp Ala Val Arg Thr Lys Leu Lys Leu Thr Pro He Ala Ala Ala Gly 
1710 1715 1720 



17847 



cag etg gac ttg tec gge tgg ttc acg get ggc tac age ggg gga gac 
Gin Leu Asp Leu Ser Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp 
1725 1730 1735 



17895 
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att tat cac age gtg tct cat gcc egg ccc cgc tgg ate tgg ttt tgc 17943 
He Tyr His Ser Val Ser His Ala Arg Pro Arg Trp He Trp Phe Cys 
1740 1745 1750 1755 

eta etc ctg ctt get gca ggg gta ggc ate tac etc etc ccc aac cga 17 991 
Leu Leu Leu Leu Ala Ala Gly Val Gly He Tyr Leu Leu Pro Asn Arg 
1760 1765 1770 

atg age acg aat cct aaa ect caa aga aag acc aaa cgt aac ace aac 18039 
Met Ser Thr Asn Pro Lys Pro Gin Arg Lys Thr Lys Arg Asn Thr Asn 
1775 1780 1785 

egg egg ccg cag gae gtc aag ttc ccg ggt ggc ggt eag ate gtt ggt 18087 
Arg Arg Pro Gin Asp Val Lys Phe Pro Gly Gly Gly Gin He Val Gly 
1790 1795 1800 

gga gtt tac ttg ttg ccg cgc agg ggc cct aga ttg ggt gtg cgc gcg 18135 
Gly Val Tyr Leu Leu Pro Arg Arg Gly Pro Arg Leu Gly Val Arg Ala 
1805 1810 1815 

acg aga aag act tec gag egg teg caa cct cga ggt aga cgt cag ect 18183 
Thr Arg Lys Thr Ser Glu Arg Ser Gin Pro Arg Gly Arg Arg Gin Pro 
1820 1825 1830 1835 

ate ccc aag get cgt egg ccc gag ggc agg acc tgg get cag ccc ggg 18231 
He Pro Lys Ala Arg Arg Pro Glu Gly Arg Thr Trp Ala Gin Pro Gly 
1840 1845 1850 

tac cct tgg ccc etc tat ggc aat gag ggc tgc ggg tgg gcg gga tgg 18279 
Tyr Pro Trp Pro Leu Tyr Gly Asn Glu Gly Cys Gly Trp Ala Gly Trp 
1855 1860 1865 

etc ctg tct ccc cgt ggc tct egg eet age tgg ggc ccc aca gae ccc 18327 
Leu Leu Ser Pro Arg Gly Ser Arg Pro Ser Trp Gly Pro Thr Asp Pro 
1870 1875 1880 

egg cgt agg teg cgc aat ttg ggt aag gtc ate gat acc ctt acg tgc 18375 
Arg Arg Arg Ser Arg Asn Leu Gly Lys Val He Asp Thr Leu Thr Cys 
1885 1890 1895 

ggc ttc gcc gae etc atg ggg tac ata ccg etc gtc taatagtcga 18421 

Gly Phe Ala Asp Leu Met Gly Tyr He Pro Leu Val l , . 
1900 1905 1910 

ctttgttccc actgtacttt tagctcgtac aaaatacaat atacttttca tttctecgta 184 81 

aacaacatgt tttcccatgt aatatccttt tctatttttc gttccgttac eaactttaca 18541 

catactttat atagctatte acttctatac actaaaaaae taagacaatt ttaattttge 18601 

tgcctgccat atttcaattt gttataaatt cetataattt atcctattag tagctaaaaa 18661 

aagatgaatg tgaatcgaat cctaagagaa ttggatctga tccacaggac gggtgtggtc 18721 

gccatgatcg egtagtcgat agtggctcca agtagcgaag cgageaggac tgggcggcgg 18781 

ccaaagcggt cggacagtgc tccgagaacg ggtgcgcata gaaattgeat caacgeatat 18841 

agcgctagca gcacgeeata gtgaetggcg atgctgtcgg aatggacgat atcccgcaag 18901 



146 



wo 01/38360 



PCTAJSOO/32326 



aggcccggca 


gtaccggcat 


aaccaagcct 


atgcctacag 


catccagggt gacggtgccg 


18961 


aggatgacga 


tgagcgcatt 


gttagatttc 


atacacggtg 


cctgactgcg 


ttagcaattt 


19021 


aactgtgata 


aactaccgca 


ttaaagcttt 


ttctttccaa 


tttttttttt 


ttcgtcatta 


19081 


taaaaatcat 


tacgaccgag 


attcccgggt 


aataactgat 


ataattaaat 


tgaagctcta 


19141 


atttgtgagt 


ttagtataca 


tgcatttact 


tataatacag 


ttttttagtt 


ttgctggccg 


19201 


catcttctca 


aatatgcttc 


ccagcctgct 


tttctgtaac 


gttcaccctc 


taccttagca 


19261 


tcccttccct 


ttgcaaatag 


tcctcttcca 


acaataataa 


tgtcagatcc 


tgtagagacc 


19321 


acatcatcca 


cggttctata 


ctgttgaccc 


aatgcgtctc 


ccttgtcatc 


taaacccaca 


19381 


ccgggtgtca 


taatcaacca 


atcgtaacct 


tcatctcttc 


cacccatgtc 


tctttgagca 


19441 


ataaagccga 


taacaaaatc 


tttgtcgctc 


ttcgcaatgt 


caacagtacc 


cttagtatat 


19501 


tctccagtag 


atagggagcc 


cttgcatgac 


aattctgcta 


acatcaaaag gcctctaggt 


19561 


tcctttgtta 


cttcttctgc 


cgcctgcttc 


aaaccgctaa 


caatacctgg gcccaccaca 


19621 


ccgtgtgcat 


tcgtaatgtc 


tgcccattct gctattctgt 


atacacccgc 


agagtactgc 


19681 


aatttgactg 


tattaccaat 


gtcagcaaat 


tttctgtctt 


cgaagagtaa 


aaaattgtac 


19741 


ttggcggata 


atgcctttag 


cggcttaact 


gtgccctcca 


tggaaaaatc 


agtcaagata 


19801 


tccacatgtg 


tttttagtaa 


acaaattttg ggacctaatg 


cttcaactaa 


ctccagtaat 


19861 


tccttggtgg 


tacgaacatc 


eaatgaagca 


cacaagtttg 


tttgcttttc gtgcatgata 


19921 


ttaaatagct 


tggcagcaac 


aggactagga 


tgagtagcag 


cacgttcctt 


atatgtagct 


19981 


ttcgacatga 


tttatcttcg 


tttcctgcag gtttttgttc 


tgtgcagttg ggttaagaat 


20041 


actgggcaat 


ttcatgtttc 


ttcaacacta 


catatgcgta 


tatataccaa 


tctaagtctg 


20101 


tgctccttcc 


ttcgttcttc 


cttctgttcg gagattaccg 


aatcaaaaaa 


atttcaagga 


20161 


aaccgaaatc 


aaaaaaaaga 


ataaaaaaaa 


aatgatgaat 


tgaaaagctt 


atcgat 


20217 



<210> 17 
<211> 1911 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 
pd.delta.NS3NS5 .pj .corel40 

<400> 17 

Met Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val 
15 10 



Leu Val Leu Asn Pro 
15 
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Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr Met Ser Lys Ala His 
20 25 30 

Gly lie Asp Pro Asn lie Arg Thr Gly Val Arg Thr lie Thr Thr Gly 
35 40 45 

Ser Pro lie Thr Tyr Ser Thr Tyr Gly Lys Phe Leu Ala Asp Gly Gly 
50 55 60 

Cys Ser Gly Gly Ala Tyr Asp lie lie lie Cys Asp Glu Cys His Ser 
65 70 75 80 

Thr Asp Ala Thr Ser He Leu Gly He Gly Thr Val Leu Asp Gin Ala 
85 90 95 

Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala Thr Ala Thr Pro Pro 
100 105 110 

Gly Ser Val Thr Val Pro His Pro Asn He Glu Glu Val Ala Leu Ser 
115 120 125 

Thr Thr Gly Glu He Pro Phe Tyr Gly Lys Ala He Pro Leu Glu Val 
130 135 140 

He Lys Gly Gly Arg His Leu He Phe Cys His Ser Lys Lys Lys Cys 
145 150 155 160 

Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly He Asn Ala Val Ala 
165 170 175 

Tyr Tyr Arg Gly Leu Asp Val Ser Val He Pro Thr Ser Gly Asp Val 
180 185 190 

Val Val Val Ala Thr Asp Ala Leu Met Thr Gly Tyr Thr Gly Asp Phe 
195 200 205 

Asp Ser Val He Asp Cys Asn Thr Cys Val Thr Gin Thr Val Asp Phe 
210 215 220 

Ser Leu Asp Pro Thr Phe Thr He Glu Thr He Thr Leu Pro Gin Asp 
225 230 235 240 

Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr Gly Arg Gly Lys Pro 
245 250 255 

Gly He Tyr Arg Phe Val Ala Pro Gly Glu Arg Pro Ser Gly Met Phe 
260 265 270 

Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala Gly Cys Ala Trp Tyr 
275 280 285 

Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu Arg Ala Tyr Met Asn 
290 295 300 

Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu Glu Phe Trp Glu Gly 
305 310 315 320 



Val Phe Thr Gly Leu Thr His He Asp Ala His Phe Leu Ser Gin Thr 
325 330 335 
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Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val Ala Tyr Gin Ala Thr 
340 345 350 

Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser Trp Asp Gin Met Trp 
355 360 365 

Lys Cys Leu lie Arg Leu Lys Pro Thr Leu His Gly Pro Thr Pro Leu 
370 375 380 

Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu lie Thr Leu Thr His Pro 
385 390 395 400 

Val Thr Lys Tyr He Met Thr Cys Met Ser Ala Asp Leu Glu Val Val 
405 410 415 

Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu Ala Ala Leu Ala Ala 
420 425 430 

Tyr Cys Leu Ser Thr Gly Cys Val Val He Val Gly Arg Val Val Leu 
435 440 445 

Ser Gly Lys Pro Ala He lie Pro Asp Arg Glu Val Leu Tyr Arg Glu 
450 455 460 

Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu Pro Tyr lie Glu Gin 
465 470 475 480 

Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys Ala Leu Gly Leu Leu 
485 490 495 

Gin Thr Ala Ser Arg Gin Ala Glu Val He Ala Pro Ala Val Gin Thr 
500 505 510 

Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys His Met Trp Asn Phe 
515 520 525 

He Ser Gly He Gin Tyr Leu Ala Gly Leu Ser Thr Leu Pro Gly Asn 
530 535 540 

Pro Ala He Ala Ser Leu Met Ala Phe Thr Ala Ala Val Thr Ser Pro 
545 550 555 560 

Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn He Leu Gly Gly Trp Val 
565 570 575 

Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr Ala Phe Val Gly Ala 
580 585 590 

Gly Leu Ala Gly Ala Ala He Gly Ser Val Gly Leu Gly Lys Val Leu 
595 600 605 

He Asp He Leu Ala Gly Tyr Gly Ala Gly Val Ala Gly Ala Leu Val 
610 615 620 

Ala Phe Lys He Met Ser Gly Glu Val Pro Ser Thr Glu Asp Leu Val 
625 630 635 640 

Asn Leu Leu Pro Ala He Leu Ser Pro Gly Ala Leu Val Val Gly Val 
645 650 655 
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Val Cys Ala Ala He Leu Arg Arg His Val Gly Pro Gly Glu Gly Ala 
660 665 670 

Val Gin Trp Met Asn Arg Leu He Ala Phe Ala Ser Arg Gly Asn His 
675 680 685 

Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp Ala Ala Ala Arg Val 
690 695 700 

Thr Ala lie Leu Ser Ser Leu Thr Val Thr Gin Leu Leu Arg Arg Leu 
705 710 715 720 

His Gin Trp He Ser Ser Glu Cys Thr Thr Pro Cys Ser Gly Ser Trp 
725 730 735 

Leu Arg Asp He Trp Asp Trp He Cys Glu Val Leu Ser Asp Phe Lys 
740 745 750 

Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu Pro Gly He Pro Phe 
755 760 765 

Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp Arg Gly Asp Gly He 
770 775 780 

Met His Thr Arg Cys His Cys Gly Ala Glu He Thr Gly His Val Lys 
785 790 795 800 

Asn Gly Thr Met Arg He Val Gly Pro Arg Thr Cys Arg Asn Met Trp 
805 810 815 

Ser Gly Thr Phe Pro He Asn Ala Tyr Thr Thr Gly Pro Cys Thr Pro 
820 825 830 

Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp Arg Val Ser Ala Glu 
835 840 845 

Glu Tyr Val Glu He Arg Gin Val Gly Asp Phe His Tyr Val Thr Gly 
850 855 860 

Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin Val Pro Ser Pro Glu 
865 870 875 880 

Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His Arg Phe Ala Pro Pro 
885 890 895 

Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe Arg Val Gly Leu His 
900 905 910 

Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu Pro Glu Pro Asp Val 
915 920 925 

Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser His He Thr Ala Glu 
930 935 940 

Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro Pro Ser Val Ala Ser 
945 950 955 960 

Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu Lys Ala Thr Cys Thr 
965 970 975 
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Ala Asn His Asp Ser Pro Asp Ala Glu Leu lie Glu Ala Asn Leu Leu 
980 985 990 

Trp Arg Gin Glu Met Gly Gly Asn He Thr Arg Val Glu Ser Glu Asn 
995 1000 1005 

Lys Val Val He Leu Asp Ser Phe Asp Pro Leu Val Ala Glu Glu Asp 
1010 1015 1020 

Glu Arg Glu He Ser Val Pro Ala Glu He Leu Arg Lys Ser Arg Arg 
025 1030 1035 1040 

Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro Asp Tyr Asn Pro Pro 
1045 1050 1055 

Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu Pro Pro Val Val His 
1060 1065 1070 

Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro Val Pro Pro Pro Arg 
1075 1080 1085 



Lys Lys Arg Thr Val Val Leu Thr 
1090 1095 

Ala Glu Leu Ala Thr Arg Ser Phe 
105 1110 

Thr Gly Asp Asn Thr Thr Thr Ser 
1125 



Glu Ser Thr Leu Ser Thr Ala Leu 
1100 

Gly Ser Ser Ser Thr Ser Gly He 
1115 1120 

Ser Glu Pro Ala Pro Ser Gly Cys 
1130 1135 



Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser Met Pro Pro Leu Glu 
1140 1145 1150 

Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly Ser Trp Ser Thr Val 
1155 1160 1165 

Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys Cys Ser Met Ser Tyr 
1170 1175 1180 

Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala Ala Glu Glu Gin Lys 
185 1190 1195 1200 

Leu Pro He Asn Ala Leu Ser Asn Ser Leu Leu Arg His His Asn Leu 
1205 1210 1215 

Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin Arg Gin Lys Lys Val 
1220 1225 1230 

Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His Tyr Gin Asp Val Leu 
1235 1240 1245 

Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys Ala Asn Leu Leu Ser 
1250 1255 1260 

Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His Ser Ala Lys Ser Lys 
265 1270 1275 1280 

Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His Ala Arg Lys Ala Val 
1285 1290 1295 
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Thr His lie Asn Ser Val Trp Lys Asp Leu Leu Glu Asp Asn Val Thr 
1300 1305 1310 

Pro lie Asp Thr Thr lie Met Ala Lys Asn Glu Val Phe Cys Val Gin 
1315 1320 1325 

Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu lie Val Phe Pro Asp 
1330 1335 1340 

Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu Tyr Asp Val Val Thr 
345 1350 1355 1360 

Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr Gly Phe Gin Tyr Ser 
1365 1370 1375 

Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala Trp Lys Ser Lys Lys 
1380 1385 1390 

Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys Phe Asp Ser Thr Val 
1395 1400 1405 

Thr Glu Ser Asp lie Arg Thr Glu Glu Ala lie Tyr Gin Cys Cys Asp 
1410 1415 1420 

Leu Asp Pro Gin Ala Arg Val Ala lie Lys Ser Leu Thr Glu Arg Leu 
425 1430 1435 1440 

Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly Glu Asn Cys Gly Tyr 
1445 1450 1455 

Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr Ser Cys Gly Asn Thr 
1460 1465 1470 

Leu Thr Cys Tyr lie Lys Ala Arg Ala Ala Cys Arg Ala Ala Gly Leu 
1475 1480 1485 

Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp Leu Val Val He Cys 
1490 1495 1500 

Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser Leu Arg Ala Phe Thr 
505 1510 1515 1520 

Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly Asp Pro Pro Gin Pro 
1525 1530 1535 

Glu Tyr Asp Leu Glu Leu He Thr Ser Cys Ser Ser Asn Val Ser Val 
1540 1545 1550 

Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr Leu Thr Arg Asp Pro 
1555 1560 1565 

Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr Ala Arg His Thr Pro 
1570 1575 1580 

Val Asn Ser Trp Leu Gly Asn He He Met Phe Ala Pro Thr Leu Trp 
585 1590 1595 1600 

Ala Arg Met He Leu Met Thr His Phe Phe Ser Val Leu He Ala Arg 
1605 1610 1615 
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Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu lie Tyr Gly Ala Cys Tyr 
1620 1625 1630 

Ser lie Glu Pro Leu Asp Leu Pro Pro lie lie Gin Arg Leu His Gly 
1635 1640 1645 

Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro Gly Glu lie Asn Arg 
1650 1655 1660 

Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro Pro Leu Arg Ala Trp 
665 1670 1675 1680 

Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu Leu Ala Arg Gly Gly 
1685 1690 1695 

Arg Ala Ala lie Cys Gly Lys Tyr Leu Phe Asn Trp Ala Val Arg Thr 
1700 1705 1710 

Lys Leu Lys Leu Thr Pro lie Ala Ala Ala Gly Gin Leu Asp Leu Ser 
1715 1720 1725 

Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp lie Tyr His Ser Val 
1730 1735 1740 

Ser His Ala Arg Pro TVrg Trp lie Trp Phe Cys Leu Leu Leu Leu Ala 
745 1750 1755 1760 

Ala Gly Val Gly lie Tyr Leu Leu Pro Asn Arg Met Ser Thr Asn Pro 
1765 1770 1775 

Lys Pro Gin Arg Lys Thr Lys Arg Asn Thr Asn Arg Arg Pro Gin Asp 
1780 1785 1790 

Val Lys Phe Pro Gly Gly Gly Gin lie Val Gly Gly Val Tyr Leu Leu 
1795 1800 1805 

Pro Arg Arg Gly Pro Arg Leu Gly Val Arg Ala Thr Arg Lys Thr Ser 
1810 1815 1820 

Glu Arg Ser Gin Pro Arg Gly Arg Arg Gin Pro lie Pro Lys Ala Arg 
825 1830 1835 1840 

Arg Pro Glu Gly Arg Thr Trp Ala Gin Pro Gly Tyr Pro Trp Pro Leu 
1845 1850 1855 

Tyr Gly Asn Glu Gly Cys Gly Trp Ala Gly Trp Leu Leu Ser Pro Arg 
1860 1865 1870 

Gly Ser Arg Pro. Ser Trp Gly Pro Thr Asp Pro Arg Arg Arg Ser Arg 
1875 1880 1885 

Asn Leu Gly Lys Val lie Asp Thr Leu Thr Cys Gly Phe Ala Asp Leu 
1890 1895 1900 

Met Gly Tyr lie Pro Leu Val 
905 1910 



<210> 18 
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<211> 20247 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 
pd.delta.NS3NS5.pj .corelSO 

<220> 
<221> CDS 

<222> (12679) . . (18441) 
<400> 18 



atc^atccta 


ccccttgcgc 


taaagaagta 


catgtgccta 


ctaacgcttg 


tctttgtctc 


60 


tgtcactaaa 


cactggatta 


ttactcccag 


atacttattt 


tggactaatt 


taaatgattt 


120 


cggatcaac3 


ttcutaatat 


cgctgaatct 


tccacaattg 


atgaaagtag 


ctaggaagag 


180 


^ a O 4> -f* ^ f9 ^ 

^^auC^^^au 


aaagcccccg 


uucccguaaa 


tctcgaagta 


tactcaaacg 


aatttagtat 


240 




atctcccaga 


tgctttcacc 


ctcacttaga 


agtgctttaa 


gcattttttt 


300 


actgtggcta 


cuccccccac 


ctgcttcttc 


cgatgattcg 


aactgtaatt 


gcaaactact 


360 


tacaacacca 


gtgatatcag 


attgatgttt 


ccgtccatag 


taaggaataa 


ttgtaaattc 


420 


/-I M ^*T^ ^r^T 

ccactgcagga 


atcaatttct 


ttaatgaggc 


ttccagaatt 


gttgcttttt 


gcgtcttgta 


480 


c tcaaacu^g 


agtgatttat 


tgacaatatc 


gaaactcagc 


gaattgctta 


tgatagtatt 


540 


3 a ^ a 

d L. ^ L. ^ a, U ^ 


aatgtggctc 


tcttgattgc 


cgccccgEua 


tgtgtaatca 


tccaacataa 


600 




tcagcagcac 


ataatgctat 


t ttctcacct 


gaaggtcttt 


caaacctttc 


660 




cgaacaagca 


ccttaggtgg 


cgcccuacac 


aatatatcaa 


attgtggcat 


720 


gcttagcgcc 


gatcttgtgt 


gcaattgata 


tctagtttca 


actactctat 


ttatcttgta 


780 


tcttgcagta 


ttcaaacacg 


ctaactcgaa 


aaactaactt 


taattgtcct 


gtttgtctcg 


840 


cgttctttcg 


aaaaatgcac 


cggccgcgca 


ttatttgtac 


tgcgaaaata 


attggtactg 


900 


cggtatcttc 


atttcatatt 


ttaaaaatgc 


acctttgctg 


cttttcctta 


atttttagac 


960 


ggcccgcagg 


ttcgttttgc ggtactatct 


tgtgataaaa 


agttgttttg 


acatgtgatc 


1020 


tgcacagatt 


ttataatgta 


ataagcaaga 


atacattatc 


aaacgaacaa 


tactggtaaa 


1080 


agaaaaccaa 


aatggacgac 


attgaaacag 


ccaagaatct 


gacggtaaaa gcacgtacag 


1140 


cttatagcgt 


ctgggatgta 


tgtcggctgt 


ttattgaaat 


gattgctcct 


gatgtagata 


1200 


ttgatataga 


gagtaaacgt 


aagtctgatg 


agctactctt 


tccaggatat 


gtcataaggc 


1260 


ccatggaatc 


tctcacaacc 


ggtaggccgt 


atggtcttga 


ttctagcgca gaagattcca 


1320 


gcgtatcttc 


tgactccagt 


gctgaggtaa 


ttttgcctgc 


tgcgaagatg gttaaggaaa 


1380 
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ggtttgattc 


gattggaaat 


ggtatgctct 


atttgatgct 


acagaataac 


aagctgttag 


ctataataat 


aggaagattg 


cccgagaaag 


gaaaaatgga 


ttgtacacag 


ttattagtcc 


agctcgtaag 


cgtcgttacc 


caattgctta 


taataggtga 


tttattcatc 


ccggaatctc 


tggcggcaga 


gaatcgttta 


cagcaaaaaa 


accatgctaa 


tacaaatgaa gaagttccct 


caagaggagc 


atataaatta 


caaaacacca 


aaaaaaggag 


agtagcaacg 


agggtaaggg 


gatccaatat 


caaaggaaat 


gatagcattg 


cagcatatag 


aacagctaaa 


gggtagtgct 


gggataatat 


cacaggaggt 


actragactac 


gtacgcattt 


aagcataaac 


acgcactatg 


caacacgcag 


atataggtgc 


gacgtgaaca 


ttttcggaag 


cgctcgtttt 


cggaaacgct 


ctagaaagta 


taggaacttc 


agagcgcttt 


ttcaaaaaac 


caaaaacgca 


ccggactgta 


tccacaaaca 


ttgctcaaaa 


gtatctcttt 


aacctaccca 


tccacctttc 


gctccttgaa 


aggcttccaa 


tgctcttcaa 


attttactgt 


ctcttcataa 


tgtaagctta 


tctttatcga 


ctttacggtt 


ccctgagatt 


gaattagttc 


ctttgtacga 


cgaattttga 


ggttcgccat 


tattatctcc 


gcctcagttt 


gatcttccgc 


tatttcaccc 


cacaatcctt 


catccgcctc 


atgttgtaca 


ttgtttagtt 


cacgagaagg 


tatatgacct 


ttatcctgtt 


ctctttccac 


gcacctaata 


acattcttca 


aggcggagaa 


tgaaaacgtg 


agaatgaatt 


tagtattatt 



cttcacaaga agcaagtcag gctgccatag 1440 
acaatagaaa gcaactatac aaatctattg 1500 
acaagaagag agctaccgaa atgctcatga 1560 
caccagctcc aacggaagaa gatgttatga 1620 
ctttagttcc accagatcgt caagctgctt 1680 
taaaggatat attcaatagt ttcaatgaac 1740 
agagtgagtt ggaaggaagg actgaagtga 1800 
ccaggcgaac aagaagtaga gacacaaatg i860 
tcactgaggg ccctaaagcg gttcccacga 1920 
gcagaaaatc acgtaatact tctagggtat 1980 
aaggatgaga ctaatccaat tgaggagtgg 2040 
gaaggaagca tacgataccc cgcatggaat 2100 
ctttcatcct acataaatag acgcatataa 2160 
ccgttcttct catgtatata tatatacagg 2220 
gtgagctgta tgtgcgcagc tcgcgttgca 228 0 
ttgaagttcc tattccgaag ttcctattct 2340 
tgaaaaccaa aagcgctctg aagacgcact 2400 
acgagctact aaaatattgc gaataccgct 2460 
gctatatatc tctgtgctat atccctatat 2520 
cttgcatcta aactcgacct ctacatcaac 2580 
caagtagacc catacggctg taatatgctg 2640 
atcgtgtgaa aaactactac cgcgataaac 2700 
ctttagtata tgatacaaga cacttttgaa 2760 
cctctggcta tttccaatta tcctgtcggc 2820 
ttcagactgc catttttcac ataatgaatc 2880 
cgcatcttgt tccgttaaac tattgacttc 2 94 0 
gtcctcttca ggcggtagct cctgatctcc 3000 
aaacttagaa atgtattcat gaattatgga 3060 
gtttgggcca gatgcccaat atgcttgaca 3120 
gtgatattct gaggcaattt tattataatc 3180 
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tcgaagataa gagaagaatg cagtgacctt tgtattgaca aatggagatt ccatgtatct 3240 
aaaaaatacg cctttaggcc ttctgatacc ctttcccctg cggtttagcg tgccttttac 3300 
attaatatct aaaccctctc cgatggtggc ctttaactga ctaataaatg caaccgatat 3360 
aaactgtgat aattctgggt gatttatgat tcgatcgaca attgtattgt acactagtgc 342 0 
aggatcaggc caatccagtt ctttttcaat taccggtgtg tcgtctgtat tcagtacatg 3480 
tccaacaaat gcaaatgcta acgttttgta tttcttataa ttgtcaggaa ctggaaaagt 354 0 
cccccttgtc gtctcgatta cacacctact ttcatcgtac accataggtt ggaagtgctg 3600 
cataatacat tgcttaatac aagcaagcag tctctcgcca ttcatatttc agttattttc 3660 
cattacagct gatgtcattg tatatcagcg ctgtaaaaat ctatctgtta cagaaggttt 372 0 
tcgcggtttt tataaacaaa actttcgtta cgaaatcgag caatcacccc agctgcgtat 3780 
ttggaaattc gggaaaaagt agagcaacgc gagttgcatt ttttacacca taatgcatga 3840 
ttaacttcga gaagggatta aggctaattt cactagtatg tttcaaaaac ctcaatctgt 3900 
ccattgaatg ccttataaaa cagctataga ttgcatagaa gagttagcta ctcaatgctt 3960 
tttgtcaaag cttactgatg atgatgtgtc tactttcagg cgggtctgta gtaaggagaa 4020 
tgacattata aagctggcac ttagaattcc acggactata gactatacta gtatactccg 4080 
tctactgtac gatacacttc cgctcaggtc cttgtccttt aacgaggcct taccactctt 414 0 
ttgttactct attgatccag ctcagcaaag gcagtgtgat ctaagattct atcttcgcga 4200 
tgtagtaaaa ctagctagac cgagaaagag actagaaatg caaaaggcac ttctacaatg 4260 
gctgccatca ttattatccg atgtgacgct gcattttttt tttttttttt tttttttttt 4320 
tttttttttt tttttttttt ttttttggta caaatatcat aaaaaaagag aatcttttta 4380 
agcaaggatt ttcttaactt cttcggcgac agcatcaccg acttcggtgg tactgttgga 4440 
accacctaaa tcaccagttc tgatacctgc atccaaaacc tttttaactg catcttcaat 4500 
ggctttacct tcttcaggca agttcaatga caatttcaac atcattgcag cagacaagat 4560 
agtggcgata gggttgacct tattctttgg caaatctgga gcggaaccat ggcatggttc 4620 
gtacaaacca aatgcggtgt tcttgtctgg caaagaggcc aaggacgcag atggcaacaa 4680 
acccaaggag cctgggataa cggaggcttc atcggagatg atatcaccaa acatgttgct 4740 
ggtgattata ataccattta ggtgggttgg gttcttaact aggatcatgg cggcagaatc 4800 
aatcaattga tgttgaactt tcaatgtagg gaattcgttc ttgatggttt cctccacagt 4 860 
ttttctccat aatcttgaag aggccaaaac attagcttta tccaaggacc aaataggcaa 4920 
tggtggctca tgttgtaggg ccatgaaagc ggccattctt gtgattcttt gcacttctgg 4980 
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aacggtgtat 


tgttcactat 


cccaagcgac 


accatcacca 


tcgtcttcct 


ttctcttacc 


5040 


aaagtaaata 


cctcccacta 


attctctaac 


aacaacgaag 


tcagtacctt 


tagcaaattg 


5100 


tggcttgatt 


ggagataagt 


ctaaaagaga 


gtcggatgca 


aagttacatg gtcttaagtt 


5160 


ggcgtacaat 


tgaagttctt 


tacggatttt 


tagtaaacct 


tgttcaggtc 


taacactacc 


5220 


ggtaccccat 


ttaggaccac 


ccacagcacc 


taacaaaacg 


gcatcagcct 


tcttggaggc 


5280 


ttccagcgcc 


tcatctggaa 


gtggaacacc 


tgtagcatcg 


atagcagcac 


caccaattaa 


5340 


atgattttcg 


aaatcgaact 


tgacattgga 


acgaacatca 


gaaatagctt 


taagaacctt 


5400 


aatggcttcg 


gctgtgattt 


cttgaccaac 


gtggtcacct 


ggcaaaacga 


cgatcttctt 


5460 


aggggcagac 


attacaatgg 


tatatccttg 


aaatatatat 


aaaaaaaaaa 


aaaaaaaaaa 


5520 


aaaaaaaaaa 


atgcagcttc 


tcaatgatat 


tcgaatacgc 


tttgaggaga 


tacagcctaa 


5580 


tatccgacaa 


actgttttac 


agatttacga 


tcgtacttgt 


tacccatcat 


tgaattttga 


5640 


acatccgaac 


ctgggagttt 


tccctgaaac 


agatagtata 


tttgaacctg 


tataataata 


5700 


tatagtctag 


cgctttacgg 


aagacaatgt 


atgtatttcg 


gttcctggag aaactattgc 


5760 


atctattgca 


taggtaatct 


tgcacgtcgc 


atccccggtt 


cattttctgc 


gtttccatct 


5820 


tgcacttcaa 


tagcatatct 


ttgttaacga 


agcatctgtg 


cttcattttg 


tagaacaaaa 


5880 


atgcaacgcg 


agagcgctaa 


tttttcaaac 


aaagaatctg 


agctgcattt 


ttacagaaca 


5940 


gaaatgcaac 


gcgaaagcgc 


tattttacca 


acgaagaatc 


tgtgcttcat 


ttttgtaaaa 


6000 


caaaaatgca 


acgcgagagc 


gctaattttt 


caaacaaaga 


atctgagctg 


catttttaca 


6060 


gaacagaaat 


gcaacgcgag 


agcgctattt 


taccaacaaa 


gaatctatac 


ttcttttttg 


6120 


ttctacaaaa 


atgcatcccg 


agagcgctat 


ttttctaaca 


aagcatctta 


gattactttt 


6180 


tttctccttt 


gtgcgctcta 


taatgcagtc 


tcttgataac 


tttttgcact 


gtaggtccgt 


6240 


taaggttaga 


agaaggctac 


tttggtgtct 


attttctctt 


ccataaaaaa 


agcctgactc 


6300 


cacttcccgc 


gtttactgat 


tactagcgaa 


gctgcgggtg 


cattttttca 


agataaaggc 


6360 


atccccgatt 


atattctata 


ccgatgtgga 


ttgcgcatac 


tttgtgaaca gaaagtgata 


6420 


gcgttgatga 


ttcttcattg 


gtcagaaaat 


tatgaacggt 


ttcttctatt 


ttgtctctat 


6480 


atactacgta 


taggaaatgt 


ttacattttc 


gtattgtttt 


cgattcactc 


tatgaatagt 


6540 


tcttactaca 


atttttttgt 


ctaaagagta 


atactagaga 


taaacataaa 


aaatgtagag 


6600 


gtcgagttta 


gatgcaagtt 


caaggagcga 


aaggtggatg 


ggtaggttat 


s^tagggatat 


6660 


agcacagaga 


tatatagcaa 


agagatactt 


ttgagcaatg 


tttgtggaag cggtattcgc 


6720 


aatattttag 


tagctcgtta 


cagtccggtg 


cgtttttggt 


tttttgaaag 


tgcgtcttca 


6780 
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gagcgctttt 


ggttttcaaa 


agcgctctga 


agttcctata 


ctttctagag 


aataggaact 


6840 


tcggaatagg 


aacttcaaag 


cgtttccgaa 


aacgagcgct 


tccgaaaatg 


caacgcgagc 


6900 


tgcgcacata 


cagctcactg 


ttcacgtcgc 


acctatatct 


gcgtgttgcc 


tgtatatata 


6960 


tatacatgag 


aagaacggca 


tagtgcgtgt 


ttatgcttaa 


atgcgtactt 


atatgcgtct 


7020 


atttatgtag 


gatgaaaggt 


agtctagtac 


ctcctgtgat 


attatcccat 


tccatgcggg 


7080 


gtatcgtatg 


cttccttcag 


cactaccctt 


tagctgttct 


atatgctgcc 


actcctcaat 


7140 


tggattagtc 


tcatccttca 


atgctatcat 


ttcctttgat 


attggatcat 


atgcatagta 


7200 


ccgagaaact 


agtgcgaagt 


agtgatcagg 


tattgctgtt 


atctgatgag 


tatacgttgt 


7260 


cctggccacg 


gcagaagcac 


gcttatcgct 


ccaatttccc 


acaacattag 


tcaactccgt 


7320 


taggcccttc 


attgaaagaa 


atgaggtcat 


caaatgtctt 


ccaatgtgag 


attttgggcc 


7380 


attttttata 


gcaaagattg 


aataaggcgc 


atttttcttc 


aaagctttat 


tgtacgatct 


7440 


gactaagtta 


tcttttaata 


attggtattc 


ctgtttattg 


cttgaagaat 


tgccggtcct 


7500 


atttactcgt 


tttaggactg 


gttcagaatt 


cctcaaaaat 


tcatccaaat 


atacaagtgg 


7560 


atcgatgata 


agctgtcaaa 


catgagaatt 


cttgaagacg 


aaagggcctc 


gtgatacgcc 


7620 


tatttttata 


ggttaatgtc 


atgataataa 


tggtttctta 


gacgtcaggt 


ggcacttttc 


7680 


ggggaaatgt 


gcgcggaacc 


cctatttgtt 


tatttttcta 


aatacattca 


aatatgtatc 


7740 


cgctcatgag 


acaataaccc 


tgataaatgc 


ttcaataata 


ttgaaaaagg 


aagagtatga 


7800 


gtattcaaca 


tttccgtgtc 


gcccttattc 


ccttttttgc 


ggcattttgc 


cttcctgttt 


7860 


ttgctcaccc 


agaaacgctg 


gtgaaagtaa 


aagatgctga 


agatcagttg 


ggtgcacgag 


7920 


tgggttacat 


cgaactggat 


ctcaacagcg 


gtaagatcct 


tgagagtttt 


cgccccgaag 


7980 


aacgttttcc 


aatgatgagc 


acttttaaag 


ttctgctatg 


tggcgcggta 


ttatcccgtg 


8040 


ttgacgccgg 


gcaagagcaa 


ctcggtcgcc 


gcatacacta 


ttctcagaat 


gacttggttg 


8100 


agtactcacc 


agtcacagaa 


aagcatctta 


cggatggcat 


gacagtaaga 


gaattatgca 


8160 


gtgctgccat 


aaccatgagt 


gataacactg 


cggccaactt 


acttctgaca 


acgatcggag 


8220 


gaccgaagga 


gctaaccgct 


tttttgcaca 


acatggggga 


tcatgtaact 


cgccttgatc 


8280 


gttgggaacc 


ggagctgaat 


gaagccatac 


caaacgacga 


gcgtgacacc 


acgatgcctg 


8340 


cagcaatggc 


aacaacgttg 


cgcaaactat 


taactggcga 


actacttact 


ctagcttccc 


8400 


ggcaacaatt 


aatagactgg 


atggaggcgg 


ataaagttgc 


aggaccactt 


ctgcgctcgg 


8460 


cccttccggc 


tggctggttt 


attgctgata 


aatctggagc 


cggtgagcgt 


gggtctcgcg 


8520 


gtatcattgc 


agcactgggg 


ccagatggta 


agccctcccg 


tatcgtagtt 


atctacacga 


8580 
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cggggagtca 


ggcaactatg 


gatgaacgaa atagacagat 


cgctgagata ggtgcctcac 


8640 


tgattaagca 


ttggtaactg 


tcagaccaag tttactcata 


tatactttag 


attgatttaa 


8700 


aacttcattt 


ttaatttaaa 


aggatctagg tgaagatcct 


ttttgataat 


ctcatgacca 


8760 


aaatccctta 


acgtgagttt 


tcgttccact gagcgtcaga 


ccccgtagaa 


aagatcaaag 


8820 


gatcttcttg 


agatcctttt 


tttctgcgcg taatctgctg 


cttgcaaaca 


aaaaaaccac 


8880 


cgctaccagc 


ggtggtttgt 


ttgccggatc aagagctacc 


aactcttttt 


ccgaaggtaa 


8940 


ctggcttcag 


cagagcgcag 


ataccaaata ctgtccttct 


agtgtagccg 


tagttaggcc 


9000 


accacttcaa 


gaactctgta 


gcaccgccta catacctcgc 


tctgctaatc 


ctgttaccag 


9060 


tggctgctgc 


cagtggcgat 


aagtcgtgtc ttaccgggtt 


ggactcaaga 


cgatagttac 


9120 


cggataaggc 


gcagcggtcg 


ggctgaacgg ggggttcgtg 


cacacagccc 


agcttggagc 


9180 


gaacgaccta 


caccgaactg 


agatacctac agcgtgagct 


atgagaaagc 


gccacgcttc 


9240 


ccgaagggag 


aaaggcggac 


aggtatccgg taagcggcag ggtcggaaca 


ggagagcgca 


9300 


cgagggagct 


tccaggggga 


aacgcctggt atctttatag 


tcctgtcggg 


tttcgccacc 


9360 


tctgacttga 


gcgtcgattt 


ttgtgatgct cgtcaggggg gcggagccta tggaaaaacg 


9420 


ccagcaacgc 


ggccttttta 


cggttcctgg ccttttgctg gccttttgct 


cacatgttct 


9480 


ttcctgcgtt 


atcccctgat 


tctgtggata accgtattac 


cgcctttgag 


tgagctgata 


9540 


ccgctcgccg 


cagccgaacg 


accgagcgca gcgagtcagt 


gagcgaggaa 


gcggaagagc 


9600 


gcctgatgcg 


gtattttctc 


cttacgcatc tgtgcggtat 


ttcacaccgc 


atatggtgca 


9660 


ctctcagtac 


aatctgctct 


gatgccgcat agttaagcca gtatacactc 


cgctatcgct 


9720 


acgtgactgg 


gtcatggctg 


cgccccgaca cccgccaaca 


cccgctgacg 


cgccctgacg 


9780 


ggcttgtctg 


ctcccggcat 


ccgcttacag acaagctgtg 


accgtctccg ggagctgcat 


9840 


gtgtcagagg 


ttttcaccgt 


catcaccgaa acgcgcgagg 


cagctgcggt 


aaagctcatc 


9900 


agcgtggtcg 


tgaagcgatt 


cacagatgtc tgcctgttca 


tccgcgtcca gctcgttgag 


9960 


tttctccaga 


agcgttaatg 


tctggcttct gataaagcgg gccatgttaa gggcggtttt 


10020 


ttcctgtttg 


gtcactgatg 


cctccgtgta agggggattt 


ctgttcatgg gggtaatgat 


10080 


accgatgaaa 


cgagagagga 


tgctcacgat acgggttact 


gatgatgaac 


atgcccggtt 


10140 


actggaacgt 


tgtgagggta 


aacaactggc ggtatggatg 


cggcgggacc 


agagaaaaat 


10200 


cactcagggt 


caatgccagc 


gcttcgttaa tacagatgta 


ggtgttccac 


agggtagcca 


10260 


gcagcatcct 


gcgatgcaga 


tccggaacat aatggtgcag 


ggcgctgact 


tccgcgtttc 


10320 


cagactttac 


gaaacacgga 


aaccgaagac cattcatgtt 


gttgctcagg 


tcgcagacgt 


10380 
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tttgcagcag 


cagtcgcttc 


acgttcgctc 


gcgtatcggt 


gattcattct gctaaccagt 


10440 


aaggcaaccc 


cgccagccta 


gccgggtcct 


caacgacagg 


agcacgatca 


tgcgcacccg 


10500 


tggccaggac 


ccaacgctgc 


ccgagatgcg 


ccgcgtgcgg 


ctgctggaga 


tggcggacgc 


10560 


gatggatatg 


ttctgccaag 


ggttggtttg 


cgcattcaca gttctccgca agaattgatt 


10620 


ggctccaatt 


cttggagtgg 


tgaatccgtt 


agcgaggtgc 


cgccggcttc 


cattcaggtc 


10680 


gaggtggccc 


ggctccatgc 


accgcgacgc 


aacgcgggga 


ggcagacaag 


gtatagggcg 


10740 


gcgcctacaa 


tccatgccaa 


cccgttccat 


gtgctcgccg 


aggcggcata 


aatcgccgtg 


10800 


acgatcagcg 


gtccaatgat 


cgaagttagg 


ctggtaagag 


ccgcgagcga 


tccttgaagc 


10860 


tgtccctgat 


ggtcgtcatc 


tacctgcctg 


gacagcatgg 


cctgcaacgc 


gggcatcccg 


10920 


atgccgccgg 


aagcgagaag 


aatcataatg 


gggaaggcca 


tccagcctcg 


cgtcgcgaac 


10980 


gccagcaaga 


cgtagcccag 


cgcgtcggcc 


gccatgccgg 


cgataatggc 


ctgcttctcg 


11040 


ccgaaacgtt 


tggtggcggg 


accagtgacg 


aaggcttgag 


cgagggcgtg 


caagattccg 


11100 


aataccgcaa 


gcgacaggcc 


gatcatcgtc 


gcgctccagc 


gaaagcggtc 


ctcgccgaaa 


11160 


atgacccaga 


gcgctgccgg 


cacctgtcct 


acgagttgca 


tgataaagaa gacagtcata 


11220 


agtgcggcga 


cgatagtcat 


gccccgcgcc 


caccggaagg 


agctgactgg gttgaaggct 


11280 


ctcaagggca 


tcggtcgagg 


atccttcaat 


atgcgcacat 


acgctgttat 


gttcaaggtc 


11340 


ccttcgttta 


agaacgaaag 


cggtcttcct 


tttgagggat 


gtttcaagtt 


gttcaaatct 


11400 


atcaaatttg 


caaatcccca 


gtctgtatct 


agagcgttga 


atcggtgatg 


cgatttgtta 


11460 


attaaattga 


tggtgtcacc 


attaccaggt 


ctagatatac 


caatggcaaa 


ctgagcacaa 


11520 


caataccagt 


ccggatcaac 


tggcaccatc 


tctcccgtag 


tctcatctaa 


tttttcttcc 


11580 


ggatgaggtt 


ccagatatac 


cgcaacacct 


ttattatggt 


ttccctgagg gaataataga 


11640 


atgtcccatt 


cgaaatcacc 


aattctaaac 


ctgggcgaat 


tgtatttcgg 


gtttgttaac 


11700 


tcgttccagt 


caggaatgtt 


ccacgtgaag 


ctatcttcca 


gcaaagtctc 


cacttcttca 


11760 


tcaaattgtg 


gagaatactc 


ccaatgctct 


tatctatggg acttccggga 


aacacagtac 


11820 


cgatacttcc 


caattcgtct 


tcagagctca 


ttgtttgttt gaagagacta 


atcaaagaat 


11880 


cgttttctca 


aaaaaattaa 


tatcttaact 


gatagtttga 


tcaaaggggc 


aaaacgtagg 


11940 


ggcaaacaaa 


cggaaaaatc 


gtttctcaaa 


ttttctgatg 


ccaagaactc 


taaccagtct 


12000 


tatctaaaaa 


ttgccttatg 


atccgtctct 


ccggttacag 


cctgtgtaac 


tgattaatcc 


12060 


tgcctttcta 


atcaccattc 


taatgtttta 


attaagggat 


tttgtcttca 


ttaacggctt 


12120 


tcgctcataa 


aaatgttatg 


acgttttgcc 


cgcaggcggg 


aaaccatcca 


cttcacgaga 


12180 
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ctgatctcct ctgccggaac accgggcatc tccaacttat aagttggaga aataagagaa 1224 0 

tttcagattg agagaatgaa aaaaaaaaac ccttagttca taggtccatt ctcttagcgc 12300 

aactacagag aacaggggca caaacaggca aaaaacgggc acaacctcaa tggagtgatg 12360 

caacctgcct ggagtaaatg atgacacaag gcaattgacc cacgcatgta tctatctcat 12420 

tttcttacac cttctattac cttctgctct ctctgatttg gaaaaagctg aaaaaaaagg 12480 

ttgaaaccag ttccctgaaa ttattcccct acttgactaa taagtatata aagacggtag 12540 

gtattgattg taattctgta aatctatttc ttaaacttct taaattctac ttttatagtt 12600 

agtctttttt ttagttttaa aacaccaaga acttagtttc gaataaacac acataaacaa 12660 

acaagcttac aaaacaaa atg get gca tat gca get cag ggc tat aag gtg 12711 

Met Ala Ala Tyr Ala Ala. Gin Gly Tyr Lys Val 
15 10 

eta gta etc aae ecc tet gtt get gca aea ctg ggc ttt ggt get tae 12759 
Leu Val Leu Asn Pro Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr 
15 20 25 

atg tee aag get cat ggg ate gat cct aac ate agg acc ggg gtg aga 12807 
Met Ser Lys Ala His Gly lie Asp Pro Asn He Arg Thr Gly Val Arg 
30 35 40 

aea att aee act ggc age ecc ate aeg tac tee acc tae ggc aag ttc 12 855 
Thr He Thr Thr Gly Ser Pro lie Thr Tyr Ser Thr Tyr Gly Lys Phe 
45 50 55 

ett gee gac ggc ggg tgc teg ggg ggc get tat gae ata ata att tgt 12 903 
Leu Ala Asp Gly Gly Cys Ser Gly Gly Ala Tyr Asp He He He Cya 
60 65 70 75 

gac gag tgc cac tee acg gat gee aea tec ate ttg ggc att ggc act 12 951 
Asp Glu Cys His Ser Thr Asp Ala Thr Ser He Leu Gly He Gly Thr 
80 85 90 

gtc ett gac caa gca gag act gcg ggg geg aga ctg gtt gtg etc gee 12999 
Val Leu Asp Gin Ala Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala 
95 100 105 



acc gee acc cct ccg ggc tec gtc act gtg cce cat ecc aac ate gag 13047 

Thr Ala Thr Pro Pro Gly Ser Val Thr Val Pro His Pro Asn He Glu 
110 115 120 

gag gtt get ctg tec acc acc gga gag ate cct ttt tac ggc aag get 13 095 

Glu Val Ala Leu Ser Thr Thr Gly Glu He Pro Phe Tyr Gly Lys Ala 

125 130 135 

ate ecc etc gaa gta ate aag ggg ggg aga cat etc ate ttc tgt cat 13143 

He Pro Leu Glu Val He Lys Gly Gly Arg His Leu He Phe Cys His 
140 145 150 155 

tea aag aag aag tgc gac gaa etc gee gca aag ctg gtc gca ttg ggc 13191 

Ser Lys Lys Lys Cys Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly 

160 165 170 
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ate aat gcc gtg gcc tac tac cgc ggt ctt gac gtg tec gtc ate ccg 13239 
He Asn Ala Val Ala Tyr Tyr Arg Gly Leu Asp Val Ser Val He Pro 
175 180 185 

aee age gge gat gtt gte gte gtg gca aee gat gee etc atg acc ggc 132 87 
Thr Ser Gly Asp Val Val Val Val Ala Thr Asp Ala Leu Met Thr Gly 
190 195 200 

tat ace gge gac ttc gac teg gtg ata gac tgc aat acg tgt gtc acc 13335 
Tyr Thr Gly Asp Phe Asp Ser Val He Asp Cys Asn Thr Cys Val Thr 
205 210 215 

cag aca gtc gat tte age ctt gac cct acc ttc acc att gag aca ate 13383 
Gin Thr Val Asp Phe Ser Leu Asp Pro Thr Phe Thr He Glu Thr He 
220 225 230 235 

acg etc cce caa gat get gtc tec cgc act caa egt egg ggc agg act 13431 
Thr Leu Pro Gin Asp Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr 
240 245 250 

ggc agg ggg aag cca ggc ate tac aga ttt gtg gca eeg ggg gag cgc 13479 
Gly Arg Gly Lys Pro Gly He Tyr Arg Phe Val Ala Pro Gly Glu Arg 
255 260 265 

cce tec ggc atg ttc gac teg tee gtc etc tgt gag tgc tat gac gca 13527 
Pro Ser Gly Met Phe Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala 
270 275 280 

ggc tgt get tgg tat gag etc acg cce gcc gag act aca gtt agg eta 13575 
Gly Cys Ala Trp Tyr Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu 
285 290 295 

cga gcg tac atg aac acc ccg ggg ctt cce gtg tgc cag gac cat ctt 13623 
Arg Ala Tyr Met Asn Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu 
300 305 310 315 

gaa ttt tgg gag ggc gtc ttt aca ggc etc act cat ata gat gee cac 13671 
Glu Phe Trp Glu Gly Val Phe Thr Gly Leu Thr His He Asp Ala His 
320 325 330 

ttt eta tec cag aca aag cag agt ggg gag aac ctt cct tac etg gta 13719 
Phe Leu Ser Gin Thr Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val 
335 340 345 

gcg tac caa gcc acc gtg tgc get agg get caa gcc cct cce cca teg 13767 
Ala Tyr Gin Ala Thr Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser 
350 355 360 

tgg gac cag atg tgg aag tgt ttg att cgc etc aag cce acc etc cat 13815 
Trp Asp Gin Met Trp Lys Cys Leu He Arg Leu Lys Pro Thr Leu His 
365 370 375 

ggg cca aca cce etg eta tac aga etg ggc get gtt cag aat gaa ate 13863 
Gly Pro Thr Pro Leu Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu He 
380 385 390 395 

acc etg acg cac cca gtc ace aaa tac ate atg aca tgc atg teg gcc 13911 
Thr Leu Thr His Pro Val Thr Lys Tyr He Met Thr Cys Met Ser Ala 
400 405 410 
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gac ctg gag gtc gtc acg age acc tgg gtg etc gtt ggc ggc gtc ctg 13959 
Asp Leu Glu Val Val Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu 
415 420 425 

get get ttg gcc geg tat tge ctg tea aca ggc tgc gtg gtc ata gtg 14007 
Ala Ala Leu Ala Ala Tyr Cys Leu Ser Thr Gly Cys Val Val lie Val . 
430 435 440 

ggc agg gtc gtc ttg tec ggg aag ccg gca ate ata cct gac agg gaa 14055 
Gly Arg Val Val Leu Ser Gly Lys Pro Ala lie lie Pro Asp Arg Glu 
445 450 455 

gtc etc tac cga gag ttc gat gag atg gaa gag tgc tet eag cae tta 14103 
Val Leu Tyr Arg Glu Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu 
460 465 470 475 

ccg tac ate gag eaa ggg atg atg etc gcc gag cag tte aag eag aag 14151 
Pro Tyr lie Glu Gin Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys 
480 485 490 

gcc etc ggc etc ctg eag ace geg tec egt eag gea gag gtt ate gee 14199 
Ala Leu Gly Leu Leu Gin Thr Ala Ser Arg Gin Ala Glu Val lie Ala 
495 500 505 

cct get gtc eag acc aac tgg caa aaa cte gag acc ttc tgg geg aag 14247 
Pro Ala Val Gin Thr Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys 
510 515 520 

cat atg tgg aac ttc ate agt ggg ata eaa tac ttg geg ggc ttg tea 142 95 
His Met Trp Asn Phe lie Ser Gly lie Gin Tyr Leu Ala Gly Leu Ser 
525 530 535 

acg ctg cet ggt aac ecc gcc att get tea ttg atg get ttt aca get 14343 
Thr Leu Pro Gly Asn Pro Ala lie Ala Ser Leu Met Ala Phe Thr Ala 
540 545 550 555 

get gte acc age eea eta acc act age eaa acc etc etc ttc aac ata 14391 
Ala Val Thr Ser Pro Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn lie 
560 565 570 

ttg ggg ggg tgg gtg get gcc cag cte gcc gee eee ggt gcc get act 14439 
Leu Gly Gly Trp Val Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr 
575 580 585 

gcc ttt gtg ggc get ggc tta get ggc gee gee ate ggc agt gtt gga 14487 
Ala Phe Val Gly Ala Gly Leu Ala Gly Ala Ala He Gly Ser Val Gly 
590 595 600 

ctg ggg aag gtc cte ata gac ate ctt gca ggg tat gge geg ggc gtg 14535 
Leu Gly Lys Val Leu He Asp He Leu Ala Gly Tyr Gly Ala Gly Val 
605 610 615 

geg gga get ctt gtg gca ttc aag ate atg age ggt gag gtc ecc tec 14583 
Ala Gly Ala Leu Val Ala Phe Lys He Met Ser Gly Glu Val Pro Ser 
620 625 630 635 

acg gag gac ctg gte aat eta ctg ecc gcc ate etc teg ecc gga gcc 14631 
Thr Glu Asp Leu Val Asn Leu Leu Pro Ala He Leu Ser Pro Gly Ala 
640 645 650 
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etc gta gtc ggc gtg gtc tgt gca gca ata ctg cgc egg cac gtt ggc 14679 
Leu Val Val Gly Val Val Cys Ala Ala He Leu Arg Arg His Val Gly 
655 660 665 

ccg gge gag ggg gca gtg eag tgg atg aac egg ctg ata gee ttc gee 14727 
Pro Gly Glu Gly Ala Val Gin Trp Met Asn Arg Leu He Ala Phe Ala 
670 675 680 

tec egg ggg aac cat gtt tec cec aeg cac tac gtg ecg gag age gat 14775 
Ser Arg Gly Asn His Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp 
685 690 695 

gca get gee cgc gtc act gee ata etc age age etc act gta ace eag 14823 
Ala Ala Ala Arg Val Thr Ala He Leu Ser Ser Leu Thr Val Thr Gin 
700 705 710 715 

etc etg agg ega ctg cac cag tgg ata age teg gag tgt aec act eca 14871 
Leu Leu Arg Arg Leu His Gin Trp He Ser Ser Glu Cys Thr Thr Pro 
720 725 730 

tge tec ggt tec tgg eta agg gae ate tgg gac tgg ata tgc gag gtg 14919 
Cys Ser Gly Ser Trp Leu Arg Asp He Trp Asp Trp He Cys Glu Val 
735 740 745 

ttg age gae ttt aag aec tgg eta aaa get aag etc atg eca cag etg 14967 
Leu Ser Asp Phe Lys Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu 
750 755 760 

cct ggg ate ccc ttt gtg tec tgc cag cgc ggg tat aag ggg gtc tgg 15015 
Pro Gly He Pro Phe Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp 
765 770 775 

ega ggg gac gge ate atg cac act cgc tgc cac tgt gga get gag ate 15063 
Arg Gly Asp Gly He Met His Thr Arg Cys His Cys Gly Ala Glu He 
780 785 790 795 

act gga eat gtc aaa aac ggg aeg atg agg ate gtc ggt cct agg ace 15111 
Thr Gly His Val Lys Asn Gly Thr Met Arg He Val Gly Pro Arg Thr 
800 805 810 

tgc agg aac atg tgg agt ggg acc ttc ccc att aat gcc tac aec aeg 15159 
Cys Arg Asn Met Trp Ser Gly Thr Phe Pro He Asn Ala Tyr Thr Thr ^ 
815 820 825 

ggc ccc tgt acc ccc ett cct gcg ccg aac tac aeg ttc geg eta tgg 15207 
Gly Pro Cys Thr Pro Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp 
830 835 840 

agg gtg tct gca gag gaa tac gtg gag ata agg eag gtg ggg gae ttc 15255 
Arg Val Ser Ala Glu Glu Tyr Val Glu He Arg Gin Val Gly Asp Phe 
845 850 855 

cac tac gtg aeg ggt atg act act gae aat ett aaa tgc ccg tge cag 15303 
His Tyr Val Thr Gly Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin 
860 865 870 875 

gtc eca teg ccc gaa ttt ttc aca gaa ttg gac ggg gtg ege eta cat 15351 
Val Pro Ser Pro Glu Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His 
880 885 890 
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agg ttt gcg ccc ccc tgc aag ccc ttg ctg egg gag gag gta tea ttc 
Arg Phe Ala Pro Pro Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe 
895 900 905 



15399 



aga gta gga etc cac gaa tac ccg gta ggg teg caa tta cct tgc gag 
Arg Val Gly Leu His Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu 
910 915 920 



15447 



cce gaa ccg gae gtg gcc gtg ttg acg tec atg etc act gat cec tec 
Pro Glu Pro Asp Val Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser 
925 930 935 



15495 



cat ata aca gca gag gcg gcc ggg cga agg ttg gcg agg gga tea ccc 
His lie Thr Ala Glu Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro 
940 945 950 955 



15543 



ccc tct gtg gcc age tee teg get age cag eta tec get eca tct etc 
Pro Ser Val Ala Ser Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu 
960 965 970 



15591 



aag gea act tgc ace get aac cat gae tec cet gat get gag etc ata 
Lys Ala Thr Cys Thr Ala Asn His Asp Ser Pro Asp Ala Glu Leu lie 
975 980 985 



15639 



gag gee aac etc eta tgg agg cag gag atg ggc gge aac ate acc agg 
Glu Ala Asn Leu Leu Trp Arg Gin Glu Met Gly Gly Asn lie Thr Arg 
990 995 1000 



15687 



gtt gag tea gaa aac aaa gtg gtg att ctg gae tec ttc gat ccg ctt 
Val Glu Ser Glu Asn Lys Val Val lie Leu Asp Ser Phe Asp Pro Leu 
1005 1010 1015 



15735 



gtg gcg gag gag gae gag egg gag ate tee gta cce gea gaa ate ctg 
Val Ala Glu Glu Asp Glu Arg Glu lie Ser Val Pro Ala Glu lie Leu 
1020 1025 1030 1035 



15783 



egg aag tct egg aga ttc gcc cag gee ctg ccc gtt tgg gcg egg ccg 
Arg Lys Ser Arg Arg Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro 
1040 1045 1050 



15831 



gae tat aac ccc ccg eta gtg gag acg tgg aaa aag ccc gae tac gaa 
Asp Tyr Asn Pro Pro Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu 
1055 1060 1065 



15879 



cca cet gtg gte cat gge tgc ccg ctt eca cct eca aag tec cct cct 
Pro Pro Val Val His Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro 
1070 1075 1080 



15927 



gtg cct ccg cct egg aag aag egg acg gtg gte etc act gaa tea ace 
Val Pro Pro Pro Arg Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr 
1085 1090 1095 



15975 



eta tct act gee ttg gee gag etc gee acc aga age ttt ggc age tec 
Leu Ser Thr Ala Leu Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser 
1100 1105 1110 1115 



16023 



tea act tec ggc att acg gge gae aat acg aca aca tec tct gag cce 
Ser Thr Ser Gly He Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro 
1120 1125 1130 



16071 
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gcc cct tct ggc tgc ccc ccc gac tec gac get gag tec tat tec tec 
Ala Pro Ser Gly Cys Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser 
1135 1140 1145 



16119 



atg ccc ccc ctg gag ggg gag cct ggg gat ccg gat ctt age gac ggg 
Met Pro Pro Leu Glu Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly 
1150 1155 1160 



16167 



tea tgg tea acg gtc agt agt gag gee aac gcg gag gat gtc gtg tgc 
Ser Trp Ser Thr Val Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys 
1165 1170 1175 



16215 



tgc tea atg tct tac tct tgg aca ggc gea etc gtc acc ccg tgc gcc 
Cys Ser Met Ser Tyr Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala 
1180 1185 1190 1195 



16263 



gcg gaa gaa cag aaa ctg ccc ate aat gea eta age aac teg ttg eta 
Ala Glu Glu Gin Lys Leu Pro lie Asn Ala Leu Ser Asn Ser Leu Leu 
1200 1205 1210 



16311 



cgt cae eae aat ttg gtg tat tec acc acc tea cgc agt get tgc caa 
Arg His His Asn Leu Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin 
1215 1220 1225 



16359 



agg cag aag aaa gtc aca ttt gac aga ctg caa gtt ctg gac age cat 
Arg Gin Lys Lys Val Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His 
1230 1235 1240 



16407 



tac cag gac gta etc aag gag gtt aaa gea gcg gcg tea aaa gtg aag 
Tyr Gin Asp Val Leu Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys 
1245 1250 1255 



16455 



get aac ttg eta tee gta gag gaa get tgc age ctg acg ccc eca cae 
Ala Asn Leu Leu Ser Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His 
1260 1265 1270 1275 



16503 



tea gcc aaa tee aag ttt ggt tat ggg gea aaa gac gtc cgt tgc eat 
Ser Ala Lys Ser Lys Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His 
1280 1285 1290 



16551 



gcc aga aag gcc gta ace eae ate aac tec gtg tgg aaa gac ctt ctg 
Ala Arg Lys Ala Val Thr His lie Asn Ser Val Trp Lys Asp Leu Leu 
1295 1300 1305 



16599 



gaa gac aat gta aca eca ata gac act ace ate atg get aag aac gag 
Glu Asp Asn Val Thr Pro lie Asp Thr Thr lie Met Ala Lys Asn Glu 
1310 1315 1320 



16647 



gtt ttc tgc gtt cag cct gag aag ggg ggt cgt aag eca get cgt etc 
Val Phe Cys Val Gin Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu 
1325 1330 1335 



16695 



ate gtg ttc ccc gat ctg ggc gtg cgc gtg tgc gaa aag atg get ttg 
lie Val Phe Pro Asp Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu 
1340 1345 1350 1355 



16743 



tac gac gtg gtt aca aag etc ccc ttg gcc gtg atg gga age tee tac 
Tyr Asp Val Val Thr Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr 
1360 1365 1370 



16791 
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gga ttc caa tac tea cca gga cag egg gtt gaa ttc etc gtg caa gcg 
Gly Phe Gin Tyr Ser Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala 
1375 1380 1385 



16839 



tgg aag tec aag aaa acc cca atg ggg ttc teg tat gat acc cgc tgc 
Trp Lys Ser Lys Lys Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys 
1390 1395 1400 



16887 



ttt gac tec aca gtc act gag age gac ate egt acg gag gag gca ate 
Phe Asp Ser Thr Val Thr Glu Ser Asp lie Arg Thr Glu Glu Ala He 
1405 1410 1415 



16935 



tac caa tgt tgt gac etc gac cce caa gee cgc gtg gee ate aag tec 
Tyr Gin Cys Cys Asp Leu Asp Pro Gin Ala Arg Val Ala He Lys Ser 
1420 1425 1430 1435 



16983 



etc acc gag agg ctt tat gtt ggg ggc cct ctt acc aat tea agg ggg 
Leu Thr Glu Arg Leu Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly 
1440 1445 1450 



17031 



gag aac tgc ggc tat cgc agg tgc cgc gcg age ggc gta ctg aca act 
Glu Asn Cys Gly Tyr Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr 
1455 1460 1465 



17079 



age tgt ggt aac acc etc act tgc tac ate aag gcc egg gca gcc tgt 
Ser Cys Gly Asn Thr Leu Thr Cys Tyr He Lys Ala Arg Ala Ala Cys 
1470 1475 1480 



17127 



ega gcc gca ggg etc cag gac tgc acc atg etc gtg tgt ggc gac gac 
Arg Ala Ala Gly Leu Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp 
1485 1490 1495 



17175 



tta gtc gtt ate tgt gaa age gcg ggg gtc cag gag gac gcg gcg age 
Leu Val Val lie Cys Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser 
1500 1505 1510 1515 



17223 



ctg aga gcc ttc acg gag get atg acc agg tac tec gcc ecc cct ggg 
Leu Arg Ala Phe Thr Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly 
1520 1525 1530 



17271 



gac cce cca caa cca gaa tac gac ttg gag etc ata aca tea tgc tec 
Asp Pro Pro Gin Pro Glu Tyr Asp Leu Glu Leu He Thr Ser Cys Ser 
1535 1540 1545 



17319 



tec aac gtg tea gtc gcc cac gac ggc get gga aag agg gtc tac tac 
Ser Asn val Ser Val Ala His Asp Gly Ala Gly Lys Arg Val Tyr Tyr 
1550 1555 1560 



17367 



etc acc cgt gac cct aca acc cce etc gcg aga get gcg tgg gag aca 
Leu Thr Arg Asp Pro Thr Thr Pro Leu Ala Arg Ala Ala Trp Glu Thr 
1565 1570 1575 



17415 



gca aga cac act cca gtc aat tec tgg eta ggc aac ata ate atg ttt 
Ala Arg His Thr Pro Val Asn Ser Trp Leu Gly Asn He He Met Phe 
1580 1585 1590 1595 



17463 



gcc ecc aca ctg tgg gcg agg atg ata ctg atg ace cat ttc ttt age 
Ala Pro Thr Leu Trp Ala Arg Met He Leu Met Thr His Phe Phe Ser 
1600 1605 1610 



17511 
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gtc ctt ata gcc agg gac cag ctt gaa cag gcc etc gat tgc gag ate 
Val Leu lie Ala Arg Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu lie 
1615 1620 1625 



17559 



tac ggg gcc tgc tac tec ata gaa eca ctg gat eta ect eca ate att 
Tyr Gly Ala Cys Tyr Ser lie Glu Pro Leu Asp Leu Pro Pro lie lie 
1630 1635 1640 



17607 



caa aga etc cat gge etc age gca ttt tea etc cac agt tac tct eca 
Gin Arg Leu His Gly Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro 
1645 1650 1655 



17655 



ggt gaa ate aat agg gtg gcc gca tgc etc aga aaa ctt ggg gta ccg 
Gly Glu lie Asn Arg Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro 
1660 1665 1670 1675 



17703 



ccc ttg ega get tgg aga eae egg gee egg age gtc cge get agg ctt 
Pro Leu Arg Ala Trp Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu 
1680 1685 1690 



17751 



Ctg gcc aga gga gge agg get gee ata tgt gge aag tac etc ttc aac 
Leu Ala Arg Gly Gly Arg Ala Ala lie Cys Gly Lys Tyr Leu Phe Asn 
1695 1700 1705 



17799 



tgg gca gta aga aca aag etc aaa etc act eca ata gcg gcc get gge 
Trp Ala Val Arg Thr Lys Leu Lys Leu Thr Pro lie Ala Ala Ala Gly 
1710 1715 1720 



17847 



cag ctg gac ttg tec gge tgg ttc aeg get gge tae age ggg gga gac 
Gin Leu Asp Leu Ser Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp 
1725 1730 1735 



17895 



att tat cac age gtg tct eat gcc egg cec cge tgg ate tgg ttt tgc 
lie Tyr His Ser Val Ser His Ala Arg Pro Arg Trp lie Trp Phe Cys 
1740 1745 1750 1755 



17943 



eta etc ctg ett get gca ggg gta gge ate tae etc etc ccc aac ega 
Leu Leu Leu Leu Ala Ala Gly Val Gly lie Tyr Leu Leu Pro Asn Arg 
1760 1765 1770 



17991 



atg age acg aat ect aaa ect caa aga aag ace aaa egt aac ace aac 
Met Ser Thr- Asn Pro Lys Pro Gin Arg Lys Thr Lys Arg Asn Thr Asn 
1775 1780 1785 



18039 



egg egg ccg cag gac gtc aag ttc ccg ggt gge ggt cag ate gtt ggt 
Arg Arg Pro Gin Asp Val Lys Phe Pro Gly Gly Gly Gin He Val Gly 
1790 1795 1800 



18087 



gga gtt tac ttg ttg ccg cge agg gge ect aga ttg ggt gtg cge gcg 
Gly Val Tyr Leu Leu Pro Arg Arg Gly Pro Arg Leu Gly Val Arg Ala 
1805 1810 1815 



18135 



acg aga aag act tec gag egg teg caa ect ega ggt aga egt cag ect 
Thr Arg Lys Thr Ser Glu Arg Ser Gin Pro Arg Gly Arg Arg Gin Pro 
1820 1825 1830 1835 



18183 



ate ccc aag get egt egg ccc gag gge agg ace tgg get cag ccc ggg 
lie Pro Lys Ala Arg Arg Pro Glu Gly Arg Thr Trp Ala Gin Pro Gly 
1840 1845 1850 



18231 
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tac cct tgg ccc etc tat ggc aat gag ggc tgc ggg tgg gcg gga tgg 18279 
Tyr Pro Trp Pro Leu Tyr Gly Asn Glu Gly Cys Gly Trp Ala Gly Trp 
1855 1860 1865 

etc etg tct ccc egt ggc tet egg cct age tgg ggc ecc aca gac ccc 18327 
Leu Leu Ser Pro Arg Gly Ser Arg Pro Ser Trp Gly Pro Thr Asp Pro 
1870 1875 1880 

egg cgt agg teg cgc aat ttg ggt aag gtc ate gat ace ctt acg tgc 18375 
Arg Arg Arg Ser Arg Asn Leu Gly Lys Val lie Asp Thr Leu Thr Cys 
1885 1890 1895 

ggc ttc gee gac etc atg ggg tae ata ccg etc gtc ggc gcc cct ctt 18423 
Gly Phe Ala Asp Leu Met Gly Tyr lie Pro Leu Val Gly Ala Pro Leu 
1900 1905 1910 1915 

gga ggc get gcc agg gcc taatagtcga etttgttcce aetgtacttt 18471 
Gly Gly Ala Ala Arg Ala 
1920 



tagctegtac 


aaaatacaat 


atacttttca 


tttctccgta 


aacaacatgt 


tttcccatgt 


18531 


aatatccttt 


tctatttttc 


gttccgttac 


caactttaca 


eatactttat 


atagctattc 


18591 


acttctatac 


actaaaaaac 


taagacaatt 


ttaattttgc 


tgcctgceat 


atttcaattt 


18651 


gttataaatt 


ectataattt 


atcctattag 


tagctaaaaa 


aagatgaatg 


tgaatcgaat 


18711 


cetaagagaa 


ttggatetga 


tccacaggac 


gggtgtggtc 


gccatgatcg 


egtagtcgat 


18771 


agtggeteca 


agtagcgaag 


cgagcaggac 


tgggcggegg 


ccaaagcggt 


cggacagtgc 


18831 


tccgagaacg 


ggtgcgeata 


gaaattgcat 


caacgcatat 


agcgctagca 


gcacgccata 


18891 


gtgactggcg 


atgctgtcgg 


aatggacgat 


atccegcaag 


aggcecggca 


gtaceggcat 


18951 


aaccaagcct 


atgcetacag 


catccagggt 


gacggtgceg 


aggatgacga 


tgagcgcatt 


19011 


gttagatttc 


atacacggtg 


cctgactgcg 


ttagcaattt 


aaetgtgata 


aactaccgea 


19071 


ttaaagcttt 


ttctttccaa 


tttttttttt 


ttcgteatta 


taaaaatcat 


tacgaccgag 


19131 


attcccgggt 


aataactgat 


ataattaaat 


tgaagcteta 


atttgtgagt 


ttagtataca 


19191 


tgcatttact 


tataatacag 


ttttttagtt 


ttgctggeeg 


catcttctca 


aatatgcttc 


19251 


eeagcetgct 


tttctgtaac 


gttcaceete 


taccttagca 


teccttcect 


ttgcaaatag 


19311 


tcctctteca 


acaataataa 


tgtcagatcc 


tgtagagacc 


acatcateca 


cggttctata 


19371 


ctgttgaccc 


aatgcgtctc 


cettgtcatc 


taaacccaca 


ccgggtgtca 


taatcaacca 


19431 


atcgtaacct 


tcatctcttc 


cacccatgtc 


tetttgagca 


ataaagccga 


taacaaaatc 


19491 


tttgtcgctc 


ttegeaatgt 


caacagtacc 


ettagtatat 


tetccagtag 


atagggagcc 


19551 


cttgcatgac 


aattetgcta 


acatcaaaag 


geetctaggt 


tcctttgtta 


cttettctgc 


19611 


egcctgcttc 


aaacegctaa 


caatacctgg 


gcccaecaca 


ccgtgtgcat 


tcgtaatgte 


19671 
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tgcccattct gctattctgt atacacccgc agagtactgc aatttgactg tattaccaat 19731 
gtcagcaaat tttctgtctt cgaagagtaa aaaattgtac ttggcggata atgcctttag 19791 
cggcttaact gtgccctcca tggaaaaatc agtcaagata tccacatgtg tttttagtaa 19851 
acaaattttg ggacctaatg cttcaactaa ctccagtaat tccttggtgg tacgaacatc 19911 
caatgaagca cacaagtttg tttgcttttc gtgcatgata ttaaatagct tggcagcaac 19971 
aggactagga tgagtagcag cacgttcctt atatgtagct ttcgacatga tttatcttcg 20031 
tttcctgcag gtttttgttc tgtgcagttg ggttaagaat actgggcaat ttcatgtttc 20091 
ttcaacacta catatgcgta tatataccaa tctaagtctg tgctccttcc ttcgttcttc 20151 
cttctgttcg gagattaccg aatcaaaaaa atttcaagga aaccgaaatc aaaaaaaaga 20211 
ataaaaaaaa aatgatgaat tgaaaagctt atcgat 20247 

<210> 19 
<211> 1921 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 
pd . delta .NS3NS5 . p j . corelSO 

<400> 19 

Met Ala Ala Tyr Ala Ala Gin Gly Tyr Lys Val Leu Val Leu Asn Pro 
15 10 15 

Ser Val Ala Ala Thr Leu Gly Phe Gly Ala Tyr Met Ser Lys Ala His 
20 25 30 

Gly He Asp Pro Asn He Arg Thr Gly Val Arg Thr He Thr Thr Gly 
35 40 45 

Ser Pro He Thr Tyr Ser Thr Tyr Gly Lys Phe Leu Ala Asp Gly Gly 
50 55 60 

Cys Ser Gly Gly Ala Tyr Asp He He He Cys Asp Glu Cys His Ser 
65 70 75 80 

Thr Asp Ala Thr Ser He Leu Gly He Gly Thr Val Leu Asp Gin Ala 
85 90 95 

Glu Thr Ala Gly Ala Arg Leu Val Val Leu Ala Thr Ala Thr Pro Pro 
100 105 110 

Gly Ser Val Thr Val Pro His Pro Asn He Glu Glu Val Ala Leu Ser 
115 120 125 

Thr Thr Gly Glu He Pro Phe Tyr Gly Lys Ala He Pro Leu Glu Val 
130 135 140 

He Lys Gly Gly Arg His Leu He Phe Cys His Ser Lys Lys Lys Cys 
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145 150 155 160 

Asp Glu Leu Ala Ala Lys Leu Val Ala Leu Gly lie Asn Ala Val Ala 
165 170 175 

Tyr Tyr Arg Gly Leu Asp Val Ser Val lie Pro Thr Ser Gly Asp Val 
180 185 190 

Val Val Val Ala Thr Asp Ala Leu Met Thr Gly Tyr Thr Gly Asp Phe 
195 200 205 

Asp Ser Val lie Asp Cys Asn Thr Cys Val Thr Gin Thr Val Asp Phe 
210 215 220 

Ser Leu Asp Pro Thr Phe Thr lie Glu Thr lie Thr Leu Pro Gin Asp 
225 230 235 240 

Ala Val Ser Arg Thr Gin Arg Arg Gly Arg Thr Gly Arg Gly Lys Pro 
245 250 255 

Gly lie Tyr Arg Phe Val Ala Pro Gly Glu Arg Pro Ser Gly Met Phe 
260 265 270 

Asp Ser Ser Val Leu Cys Glu Cys Tyr Asp Ala Gly Cys Ala Trp Tyr 
275 280 285 

Glu Leu Thr Pro Ala Glu Thr Thr Val Arg Leu Arg Ala Tyr Met Asn 
290 295 300 

Thr Pro Gly Leu Pro Val Cys Gin Asp His Leu Glu Phe Trp Glu Gly 
305 310 315 320 

Val Phe Thr Gly Leu Thr His lie Asp Ala His Phe Leu Ser Gin Thr 
325 330 335 

Lys Gin Ser Gly Glu Asn Leu Pro Tyr Leu Val Ala Tyr Gin Ala Thr 
340 345 350 

Val Cys Ala Arg Ala Gin Ala Pro Pro Pro Ser Trp Asp Gin Met Trp 
355 360 365 

Lys Cys Leu lie Arg Leu Lys Pro Thr Leu His Gly Pro Thr Pro Leu 
370 375 380 

Leu Tyr Arg Leu Gly Ala Val Gin Asn Glu He Thr Leu Thr His Pro 
385 390 395 400 

Val Thr Lys Tyr He Met Thr Cys Met Ser Ala Asp Leu Glu Val Val 
405 410 415 

Thr Ser Thr Trp Val Leu Val Gly Gly Val Leu Ala Ala Leu Ala Ala 
420 425 430 

Tyr Cys Leu Ser Thr Gly Cys Val Val He Val Gly Arg Val Val Leu 
435 440 445 

Ser Gly Lys Pro Ala He He Pro Asp Arg Glu Val Leu Tyr Arg Glu 
450 455 460 
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Phe Asp Glu Met Glu Glu Cys Ser Gin His Leu Pro Tyr He Glu Gin 
465 470 475 480 

Gly Met Met Leu Ala Glu Gin Phe Lys Gin Lys Ala Leu Gly Leu Leu 
485 490 495 

Gin Thr Ala Ser Arg Gin Ala Glu Val He Ala Pro Ala Val Gin Thr 
500 505 510 

Asn Trp Gin Lys Leu Glu Thr Phe Trp Ala Lys His Met Trp Asn Phe 
515 520 525 

He Ser Gly He Gin Tyr Leu Ala Gly Leu Ser Thr Leu Pro Gly Asn 
530 535 540 

Pro Ala He Ala Ser Leu Met Ala Phe Thr Ala Ala Val Thr Ser Pro 
545 550 555 560 

Leu Thr Thr Ser Gin Thr Leu Leu Phe Asn He Leu Gly Gly Trp Val 
565 570 575 

Ala Ala Gin Leu Ala Ala Pro Gly Ala Ala Thr Ala Phe Val Gly Ala 
580 585 590 

Gly Leu Ala Gly Ala Ala He Gly Ser Val Gly Leu Gly Lys Val Leu 
595 600 605 

He Asp He Leu Ala Gly Tyr Gly Ala Gly Val Ala Gly Ala Leu Val 
610 615 620 

Ala Phe Lys He Met Ser Gly Glu Val Pro Ser Thr Glu Asp Leu Val 
625 630 635 640 

Asn Leu Leu Pro Ala He Leu Ser Pro Gly Ala Leu Val Val Gly Val 
645 650 655 

Val Cys Ala Ala He Leu Arg Arg His Val Gly Pro Gly Glu Gly Ala 
660 665 670 

Val Gin Trp Met Asn Arg Leu He Ala Phe Ala Ser Arg Gly Asn His 
675 680 685 

Val Ser Pro Thr His Tyr Val Pro Glu Ser Asp Ala Ala Ala Arg Val 
690 695 700 

Thr Ala He Leu Ser Ser Leu Thr Val Thr Gin Leu Leu Arg Arg Leu 
705 710 715 720 

His Gin Trp He Ser Ser Glu Cys Thr Thr Pro Cys Ser Gly Ser Trp 
725 730 735 

Leu Arg Asp He Trp Asp Trp He Cys Glu Val Leu Ser Asp Phe Lys 
740 745 750 

Thr Trp Leu Lys Ala Lys Leu Met Pro Gin Leu Pro Gly He Pro Phe 
755 760 765 



Val Ser Cys Gin Arg Gly Tyr Lys Gly Val Trp Arg Gly Asp Gly He 
770 775 780 
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Met His Thr Arg Cys His Cys Gly Ala Glu He Thr Gly His Val Lys 
785 790 795 800 

Asn Gly Thr Met Arg lie Val Gly Pro Arg Thr Cys Arg Asn Met Trp 
805 810 815 

Ser Gly Thr Phe Pro He Asn Ala Tyr Thr Thr Gly Pro Cys Thr Pro 
820 825 830 

Leu Pro Ala Pro Asn Tyr Thr Phe Ala Leu Trp Arg Val Ser Ala Glu 
835 840 845 

Glu Tyr Val Glu He Arg Gin Val Gly Asp Phe His Tyr Val Thr Gly 
850 855 860 

Met Thr Thr Asp Asn Leu Lys Cys Pro Cys Gin Val Pro Ser Pro Glu 
865 870 875 880 

Phe Phe Thr Glu Leu Asp Gly Val Arg Leu His Arg Phe Ala Pro Pro 
885 890 895 

Cys Lys Pro Leu Leu Arg Glu Glu Val Ser Phe Arg Val Gly Leu His 
900 905 910 

Glu Tyr Pro Val Gly Ser Gin Leu Pro Cys Glu Pro Glu Pro Asp Val 
915 920 925 

Ala Val Leu Thr Ser Met Leu Thr Asp Pro Ser His He Thr Ala Glu 
930 935 940 

Ala Ala Gly Arg Arg Leu Ala Arg Gly Ser Pro Pro Ser Val Ala Ser 
945 950 955 960 

Ser Ser Ala Ser Gin Leu Ser Ala Pro Ser Leu Lys Ala Thr Cys Thr 
965 970 975 

Ala Asn His Asp Ser Pro Asp Ala Glu Leu He Glu Ala Asn Leu Leu 
980 985 990 

Trp Arg Gin Glu Met Gly Gly Asn He Thr Arg Val Glu Ser Glu Asn 
995 1000 1005 

Lys Val Val He Leu Asp Ser Phe Asp Pro Leu Val Ala Glu Glu Asp 
1010 1015 1020 

Glu Arg Glu He Ser Val Pro Ala Glu He Leu Arg Lys Ser Arg Arg 
025 1030 1035 1040 

Phe Ala Gin Ala Leu Pro Val Trp Ala Arg Pro Asp Tyr Asn Pro Pro 
1045 1050 1055 

Leu Val Glu Thr Trp Lys Lys Pro Asp Tyr Glu Pro Pro Val Val His 
1060 1065 1070 

Gly Cys Pro Leu Pro Pro Pro Lys Ser Pro Pro Val Pro Pro Pro Arg 
1075 1080 1085 

Lys Lys Arg Thr Val Val Leu Thr Glu Ser Thr Leu Ser Thr Ala Leu 
1090 1095 1100 
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Ala Glu Leu Ala Thr Arg Ser Phe Gly Ser Ser Ser Thr Ser Gly He 
105 1110 1115 1120 

Thr Gly Asp Asn Thr Thr Thr Ser Ser Glu Pro Ala Pro Ser Gly Cys 
1125 1130 1135 

Pro Pro Asp Ser Asp Ala Glu Ser Tyr Ser Ser Met Pro Pro Leu Glu 
1140 1145 1150 

Gly Glu Pro Gly Asp Pro Asp Leu Ser Asp Gly Ser Trp Ser Thr Val 
1155 1160 1165 

Ser Ser Glu Ala Asn Ala Glu Asp Val Val Cys Cys Ser Met Ser Tyr 
1170 1175 1180 

Ser Trp Thr Gly Ala Leu Val Thr Pro Cys Ala Ala Glu Glu Gin Lys 
185 1190 1195 1200 

Leu Pro He Asn Ala Leu Ser Asn Ser Leu Leu Arg His His Asn Leu 
1205 1210 1215 

Val Tyr Ser Thr Thr Ser Arg Ser Ala Cys Gin Arg Gin Lys Lys Val 
1220 1225 1230 

Thr Phe Asp Arg Leu Gin Val Leu Asp Ser His Tyr Gin Asp Val Leu 
1235 1240 1245 

Lys Glu Val Lys Ala Ala Ala Ser Lys Val Lys Ala Asn Leu Leu Ser 
1250 1255 1260 

Val Glu Glu Ala Cys Ser Leu Thr Pro Pro His Ser Ala Lys Ser Lys 
265 1270 1275 1280 

Phe Gly Tyr Gly Ala Lys Asp Val Arg Cys His Ala Arg Lys Ala Val 
1285 1290 1295 

Thr His He Asn Ser Val Trp Lys Asp Leu Leu Glu Asp Asn Val Thr 
1300 1305 1310 

Pro He Asp Thr Thr He Met Ala Lys Asn Glu Val Phe Cys Val Gin 
1315 1320 1325 

Pro Glu Lys Gly Gly Arg Lys Pro Ala Arg Leu He Val Phe Pro Asp 
1330 1335 1340 

Leu Gly Val Arg Val Cys Glu Lys Met Ala Leu Tyr Asp Val Val Thr 
345 1350 1355 1360 

Lys Leu Pro Leu Ala Val Met Gly Ser Ser Tyr Gly Phe Gin Tyr Ser 
1365 1370 1375 

Pro Gly Gin Arg Val Glu Phe Leu Val Gin Ala Trp Lys Ser Lys Lys 
1380 1385 1390 

Thr Pro Met Gly Phe Ser Tyr Asp Thr Arg Cys Phe Asp Ser Thr Val 
1395 1400 1405 

Thr Glu Ser Asp He Arg Thr Glu Glu Ala He Tyr Gin Cys Cys Asp 
1410 1415 1420 
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Leu Asp Pro Gin Ala Arg Val Ala He Lys Ser Leu Thr Glu Arg Leu 
425 1430 1435 1440 

Tyr Val Gly Gly Pro Leu Thr Asn Ser Arg Gly Glu Asn Cys Gly Tyr 
1445 1450 1455 

Arg Arg Cys Arg Ala Ser Gly Val Leu Thr Thr Ser Cys Gly Asn Thr 
1460 1465 1470 

Leu Thr Cys Tyr He Lys Ala Arg Ala Ala Cys Arg Ala Ala Gly Leu 
1475 1480 1485 

Gin Asp Cys Thr Met Leu Val Cys Gly Asp Asp Leu Val Val He Cys 
1490 1495 1500 

Glu Ser Ala Gly Val Gin Glu Asp Ala Ala Ser Leu Arg Ala Phe Thr 
505 1510 1515 1520 

Glu Ala Met Thr Arg Tyr Ser Ala Pro Pro Gly Asp Pro Pro Gin Pro 
1525 1530 1535 

Glu Tyr Asp Leu Glu Leu He Thr Ser Cys Ser Ser Asn Val Ser Val 
1540 1545 1550 



Ala His Asp Gly Ala Gly Lys Arg 
1555 1560 

Thr Thr Pro Leu Ala Arg Ala Ala 
1570 1575 

Val Asn Ser Trp Leu Gly Asn He 
585 1590 

Ala Arg Met He Leu Met Thr His 
1605 



Val Tyr Tyr Leu Thr Arg Asp Pro 
1565 

Trp Glu Thr Ala Arg His Thr Pro 
1580 

He Met Phe Ala Pro Thr Leu Trp 
1595 1600 

Phe Phe Ser Val Leu He Ala Arg 
1610 1615 



Asp Gin Leu Glu Gin Ala Leu Asp Cys Glu He Tyr Gly Ala Cys Tyr 
1620 1625 1630 

Ser He Glu Pro Leu Asp Leu Pro Pro He He Gin Arg Leu His Gly 
1635 1640 1645 

Leu Ser Ala Phe Ser Leu His Ser Tyr Ser Pro Gly Glu He Asn Arg 
1650 1655 1660 

Val Ala Ala Cys Leu Arg Lys Leu Gly Val Pro Pro Leu Arg Ala Trp 
665 1670 1675 1680 

Arg His Arg Ala Arg Ser Val Arg Ala Arg Leu Leu Ala Arg Gly Gly 
1685 1690 1695 

Arg Ala Ala He Cys Gly Lys Tyr Leu Phe Asn Trp Ala Val Arg Thr 
1700 1705 1710 

Lys Leu Lys Leu Thr Pro He Ala Ala Ala Gly Gin Leu Asp Leu Ser 
1715 1720 1725 

Gly Trp Phe Thr Ala Gly Tyr Ser Gly Gly Asp He Tyr His Ser Val 
1730 1735 1740 
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Ser His Ala Arg Pro Arg Trp lie Trp Phe Cys Leu Leu Leu Leu Ala 
745 1750 1755 1760 

Ala Gly Val Gly lie Tyr Leu Leu Pro Asn Arg Met Ser Thr Asn Pro 
1765 1770 1775 

Lys Pro Gin Arg Lys Thr Lys Arg Asn Thr Asn Arg Arg Pro Gin Asp 
1780 1785 1790 

Val Lys Phe Pro Gly Gly Gly Gin lie Val Gly Gly Val Tyr Leu Leu 
1795 1800 1805 

Pro Arg Arg Gly Pro Arg Leu Gly Val Arg Ala Thr Arg Lys Thr Ser 
1810 1815 1820 

Glu Arg Ser Gin Pro Arg Gly Arg Arg Gin Pro lie Pro Lys Ala Arg 
825 1830 1835 1840 

Arg Pro Glu Gly Arg Thr Trp Ala Gin Pro Gly Tyr Pro Trp Pro Leu 
1845 1850 1855 

Tyr Gly Asn Glu Gly Cys Gly Trp Ala Gly Ti^p Leu Leu Ser Pro Arg 
1860 1865 1870 

Gly Ser Arg Pro Ser Trp Gly Pro Thr Asp Pro Arg Arg Arg Ser Arg 
1875 1880 1885 

Asn Leu Gly Lys Val lie Asp Thr Leu Thr Cys Gly Phe Ala Asp Leu 
1890 1895 1900 

Met Gly Tyr lie Pro Leu Val Gly Ala Pro Leu Gly Gly Ala Ala Arg 
905 1910 1915 1920 

Ala 
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